proposal 141: download server descriptors on demand
Nick Mathewson
nickm at freehaven.net
Tue Jul 15 21:29:45 UTC 2008
On Fri, Jul 11, 2008 at 08:22:55PM +0200, Peter Palfrader wrote:
> On Mon, 16 Jun 2008, Nick Mathewson wrote:
[The proposal sayeth:]
> > > 3.1 Load balancing info in consensus documents
> > >
> > > One of the reasons why clients download all server descriptors is for
> > > doing load proper load balancing as described in 2.1. In order for
> > > clients to not require all server descriptors this information will
> > > have to move into the network status document.
> > >
> > > [XXX Two open questions here:
> > > a) how do we arrive at a consensus weight?
> >
> > Perhaps the vote could contain the node's bandwidth, and this could be
> > used to calculate the weights? It's necessary that the consensus
> > remain a deterministic function of the votes.
>
> That's one approach. It means however that when we want to tweak the
> weighting algorithm we have to introduce new consensus methods.
>
> The other approach is that every voter assigns weights for each of the
> purposes (Exit, Guard, ..) so that their total sum is some constant X.
> When building a consensus we take the median for each purpose for each
> router.
That latter appraoch seems so reasonable that I can't reconstruct why
I suggested anything else.
If there are an even number of votes, we need to specify that we mean
the low median.
> > > b) how to represent weights in the consensus?
> > > Maybe "s Guard=0.13 Exit=0.02 Middle=0.00 Stable.."
> >
> > That would break backward compatibility. Adding a new per-line
> > instead would probably be better. We should play with representations
> > here till we wind up with something compressible, and we should figure
> > out the space impact of doing this.
>
> How about something like "w Exit=41 Guard=94 Middle=543"
Sure. Let's have any absent weight default to 0.
[...]
> > My first thought was that I'd prefer to avoid multiplying machinery
> > here. When we design RELAY_REQUEST_SD, let's try to keep looking to
> > see whether we can add a padding argument to RELAY_BEGIN_DIR rather
> > than forcing a new relay cell type?
>
> A new cell type safes on round trip times. Using BEGIN_DIR means we
> send the cell, get a "CONNECTED" back, send the request, get a result.
>
> Using a dedicated cell type will cut that in half.
Ok, reasonable. Though perhaps this means that BEGIN_DIR should
optionally take a body containing the start HTTP request. But that
may be fodder for another proposal.
> > Something else to figure out here is migration. When the first cut of
> > this system is done, only new servers will support RELAY_REQUEST_SD.
> > This means that clients will still need to pre-download descriptors
> > under some circumstances.
> >
> > In fact, the rules will be pretty weird here. If extends are done by
> > first asking B for C's descriptor, then clients need to know whether B
> > supports RELAY_REQUEST_SD. If it doesn't, they need to have C's
> > descriptor, which means they need to have downloaded it in advance.
> >
> > In its final version, this proposal needs a migration plan.
>
> We could fall back to BEGIN_DIR, but that only helps when B is a
> directory cache.
>
> One approach is to introduce server support first, and wait with client
> use until a significant part of the network is supporting it. Unless we
> can come up with something smarter.
This seems fine for now, unless (as you say) we come up with something
smarter.
[...]
> Theory: Most routers use one of a very small set of different exit policies (if
> we think of a router's own IP address in its exit policy as a single token
> @@IP@@ or whatever).
>
> Maybe the consensus document should include a hash of the (normalized,
> i.e. the router's IP replaced with a token) exit policy.
>
> For instance the current consensus has the following exit policies:
> 750 ea118b6480d664e6b65363d6330f62d76312309d -
> 606 6c327099a4eb1ff45128d36643d7494d706e1736 -
> 200 cc5e5310ca7fc1f22b0ce1d017817934889e424a -
> 38 d640a0b886b61d20f3390f16e68f94bb67ca3e02 -
> 34 04e917f4dfa4fde648fa2e64ed450cf078bba22d -
> 22 24317c522385314806403543dbdc8c4d797c412e -
> 16 c6fca3e85a850ab5ebce7e579739c30077b1e967 -
> 15 c1ca7a48e410911356a8fc585744fb3ffeab99da -
> 12 cc44ee0df0f2a42868e65150e11d16274f71cd4c -
> 9 5c74594ad9dd4df0bd40e1e2cd3fe7f2906657cf -
> 8 daad277346eac48f247411dc60e2020a767861f4 -
> 8 63fa7a8e07fbf314abb52b19c67a6b773052d0d9 -
> 7 b2b8e7d147a05258bf375407dc29abeeeda9e25f -
> 6 a5ac5094b4a3fb116fe99cebba0787609d7c99b9 -
> 6 a29974dcf97a012726710ced75eab63ff95383dc -
> 6 3c0d8576d935aae8c6c296216ac559ced03a612f -
> 6 20c7b1b0914ce4f5817b58912723d5e26974a373 -
> i.e. ea11.. exists 750 times, 6c.. 606 times etc.
>
>
> Now when a tor client first start it will pick a route based only on the
> weights it learns from the consensus. In the course of building that
> circuit it will learn a few exit policies.
>
> Next time it needs to build a circuit it can again generate it only
> based on weights. If it is lucky it already will have the exit
> policy (even if not the full SD) of the exit node. Should that policy
> not allow exits to the target that circuit can be abandoned immediately.
>
>
> We will have a hard time exiting when only very few nodes allow
> connections to a specific IP:port. But maybe that's ok.
Hmmm. I need to think about this part more. I'm in particular
curious whether we can do better than hashed policies with a
ports/addresses list, but I think one of us will need to actually
build a prototype to see how well this works or doesn't.
More information about the tor-dev
mailing list