proposal 141: download server descriptors on demand

Tue Jul 15 21:29:45 UTC 2008

On Fri, Jul 11, 2008 at 08:22:55PM +0200, Peter Palfrader wrote:
> On Mon, 16 Jun 2008, Nick Mathewson wrote:

     [The proposal sayeth:]
> > > 3.1 Load balancing info in consensus documents
> > > 
> > >   One of the reasons why clients download all server descriptors is for
> > >   doing load proper load balancing as described in 2.1.  In order for
> > >   clients to not require all server descriptors this information will
> > >   have to move into the network status document.
> > > 
> > >   [XXX Two open questions here:
> > >    a) how do we arrive at a consensus weight?
> > 
> > Perhaps the vote could contain the node's bandwidth, and this could be
> > used to calculate the weights?  It's necessary that the consensus
> > remain a deterministic function of the votes.
> 
> That's one approach.  It means however that when we want to tweak the
> weighting algorithm we have to introduce new consensus methods.
>
> The other approach is that every voter assigns weights for each of the
> purposes (Exit, Guard, ..) so that their total sum is some constant X.
> When building a consensus we take the median for each purpose for each
> router.

That latter appraoch seems so reasonable that I can't reconstruct why
I suggested anything else.

If there are an even number of votes, we need to specify that we mean
the low median.

> > >    b) how to represent weights in the consensus?
> > >       Maybe "s Guard=0.13 Exit=0.02 Middle=0.00 Stable.."
> > 
> > That would break backward compatibility.  Adding a new per-line
> > instead would probably be better.  We should play with representations
> > here till we wind up with something compressible, and we should figure
> > out the space impact of doing this.
> 
> How about something like "w Exit=41 Guard=94 Middle=543"

Sure.  Let's have any absent weight default to 0.

 [...]
> > My first thought was that I'd prefer to avoid multiplying machinery
> > here.  When we design RELAY_REQUEST_SD, let's try to keep looking to
> > see whether we can add a padding argument to RELAY_BEGIN_DIR rather
> > than forcing a new relay cell type?
> 
> A new cell type safes on round trip times.  Using BEGIN_DIR means we
> send the cell, get a "CONNECTED" back, send the request, get a result.
> 
> Using a dedicated cell type will cut that in half.

Ok, reasonable.  Though perhaps this means that BEGIN_DIR should
optionally take a body containing the start HTTP request.  But that
may be fodder for another proposal.

> > Something else to figure out here is migration.  When the first cut of
> > this system is done, only new servers will support RELAY_REQUEST_SD.
> > This means that clients will still need to pre-download descriptors
> > under some circumstances.
> > 
> > In fact, the rules will be pretty weird here.  If extends are done by
> > first asking B for C's descriptor, then clients need to know whether B
> > supports RELAY_REQUEST_SD.  If it doesn't, they need to have C's
> > descriptor, which means they need to have downloaded it in advance.
> > 
> > In its final version, this proposal needs a migration plan.
> 
> We could fall back to BEGIN_DIR, but that only helps when B is a
> directory cache.
> 
> One approach is to introduce server support first, and wait with client
> use until a significant part of the network is supporting it.  Unless we
> can come up with something smarter.

This seems fine for now, unless (as you say) we come up with something
smarter.

 [...] 
> Theory: Most routers use one of a very small set of different exit policies (if
> we think of a router's own IP address in its exit policy as a single token
> @@IP@@ or whatever).
> 
> Maybe the consensus document should include a hash of the (normalized,
> i.e. the router's IP replaced with a token) exit policy.
> 
> For instance the current consensus has the following exit policies:
>     750 ea118b6480d664e6b65363d6330f62d76312309d  -
>     606 6c327099a4eb1ff45128d36643d7494d706e1736  -
>     200 cc5e5310ca7fc1f22b0ce1d017817934889e424a  -
>      38 d640a0b886b61d20f3390f16e68f94bb67ca3e02  -
>      34 04e917f4dfa4fde648fa2e64ed450cf078bba22d  -
>      22 24317c522385314806403543dbdc8c4d797c412e  -
>      16 c6fca3e85a850ab5ebce7e579739c30077b1e967  -
>      15 c1ca7a48e410911356a8fc585744fb3ffeab99da  -
>      12 cc44ee0df0f2a42868e65150e11d16274f71cd4c  -
>       9 5c74594ad9dd4df0bd40e1e2cd3fe7f2906657cf  -
>       8 daad277346eac48f247411dc60e2020a767861f4  -
>       8 63fa7a8e07fbf314abb52b19c67a6b773052d0d9  -
>       7 b2b8e7d147a05258bf375407dc29abeeeda9e25f  -
>       6 a5ac5094b4a3fb116fe99cebba0787609d7c99b9  -
>       6 a29974dcf97a012726710ced75eab63ff95383dc  -
>       6 3c0d8576d935aae8c6c296216ac559ced03a612f  -
>       6 20c7b1b0914ce4f5817b58912723d5e26974a373  -
> i.e. ea11.. exists 750 times, 6c.. 606 times etc.
> 
> 
> Now when a tor client first start it will pick a route based only on the
> weights it learns from the consensus.  In the course of building that
> circuit it will learn a few exit policies.
> 
> Next time it needs to build a circuit it can again generate it only
> based on weights.  If it is lucky it already will have the exit
> policy (even if not the full SD) of the exit node.  Should that policy
> not allow exits to the target that circuit can be abandoned immediately.
> 
> 
> We will have a hard time exiting when only very few nodes allow
> connections to a specific IP:port.  But maybe that's ok.

Hmmm.  I need to think about this part more.  I'm in particular
curious whether we can do better than hashed policies with a
ports/addresses list, but I think one of us will need to actually
build a prototype to see how well this works or doesn't.