draft proposal: download server descriptors on demand
Roger Dingledine
arma at mit.edu
Mon Nov 10 12:24:00 UTC 2008
On Mon, Jun 16, 2008 at 12:06:44AM +0200, Peter Palfrader wrote:
> Tor clients use the information from server descriptors for
> different purposes, which are considered in the following sections.
>
> #three ways: One, to determine if a server will be able to handle
> #this client's request; two, to actually communicate or use the server;
> #three, for load balancing decisions.
> #
> #These three points are considered in the following subsections.
[snip]
> 2.4 Contact/key information
>
> A server descriptor lists a server's IP address and TCP ports on which
> it accepts onion and directory connections. Furthermore it contains
> the onion key, a short lived RSA key to which clients encrypt CREATE
> cells.
>
> 3.2 Fetching descriptors on demand
>
> As described in 2.4 a descriptor lists IP address, OR- and Dir-Port,
> and the onion key for a server.
>
> A client already knows the IP address and the ports from the consensus
> documents, but without the onion key it will not be able to send
> CREATE/EXTEND cells for that server. Since the client needs the onion
> key it needs the descriptor.
[snip]
I've been brainstorming with weasel. We hashed out some more details
about how to proceed if we want to finish this piece of proposal 141.
The next step is to spec out a new RELAY_REQUEST_SD relay cell. For
example:
version [1 byte]
stream_id [2 bytes]
padding length [2 bytes]
SD fingerprint [20 bytes]
We use a new relay cell type, since begin_dir introduces extra roundtrips,
and worse, doesn't support descriptor padding so it leaks to an observer
how big a descriptor we just fetched.
Replies to RELAY_REQUEST_SD requests need to be padded to some constant
upper limit in order to conceal a client's destination from anybody who
might be counting cells/bytes. The recommended padding size should be
a public parameter in the consensus. (That means we can tell which dir
the client is using when we just changed that value, but that shouldn't
happen often.)
For starters, a suitable padding length is 4KB. Current biggest
descriptors are:
3640 server-descriptor/1/e/1e2c860f76be78b03bf7d2a4fb7f9fbef726c43f
3655 server-descriptor/b/e/bea2ce2c31a00046c2e9700b2338eb8c48b148b2
4069 server-descriptor/9/c/9c116b33ae4c117c07d61815428027a4ba3287ed
7850 server-descriptor/5/e/5ed3ee0b6f151652ddc0bda6822729c1918c9201
and if we compress the answers, that 7850 byte one compresses to 2362
bytes. (Large descriptors generally have large exit policies, which
compress well.)
The authorities need to stop accepting SDs that don't fit in
the padding. Right now authorities reject SDs that are over 20KB
uncompressed. Should they compress them on the fly when deciding whether
to accept them or send back an error? That seems like a DoS attack waiting
to happen. Should we get relays to compress them before uploading? No,
because then they could upload compression bombs. Conclusion is a) relays
should test-compress them first and decline to upload if they're over the
current published padding size; b) authorities should try compressing them
at their leisure before putting them into a consensus vote, and drop if
too big; and c) the 4KB limit should be enough overhead that most relays
will have upgraded to the test-compress-first code before it bites them.
Since the client specifies the padding in the RELAY_REQUEST_SD cell, it
also can tell when the response is finished: it's done when the correct
number of relay cells come back.
(We could have picked a lower default padding cap, like 2KB, on the
theory that all reasonable descriptors fit in it. But I figured 4KB
doesn't hurt too much, first so we can have the breathing room above,
and second because Tor isn't bad at pushing n+1 cells when it's already
pushing n -- it's the round trips that kill us, not the throughput.)
How relays answer the request is easy: they just stick the answers
split over a series of RELAY_ANSWER_SD relay cells, and the client
pieces them together.
How do we build it on the client code-wise? Probably one of those 'linked'
connections like we use for begindir requests is the right move, since
then it'll have a buf already set up that accumulates bytes until the
stream is finished.
So the plan would be a) teach relays how to answer the request, b) teach
clients how to ask it so we can test, c) wait until lots of relays support
it, d) teach clients to ask when the relay is of a sufficient version,
e) teach clients to stop fetching descriptors the normal way.
Somewhere in there, we'd also want to make sure that every relay that
supports this feature mirrors all the descriptors, so they're present
when requested.
Ok. So those are my notes on how to proceed in the current direction. In
my next post I will describe a radical departure that is much easier
to build.
--Roger
More information about the tor-dev
mailing list