proposal 141: download server descriptors on demand

Fri Jul 11 18:22:55 UTC 2008

On Mon, 16 Jun 2008, Nick Mathewson wrote:

>  
> > 3.1 Load balancing info in consensus documents
> > 
> >   One of the reasons why clients download all server descriptors is for
> >   doing load proper load balancing as described in 2.1.  In order for
> >   clients to not require all server descriptors this information will
> >   have to move into the network status document.
> > 
> >   [XXX Two open questions here:
> >    a) how do we arrive at a consensus weight?
> 
> Perhaps the vote could contain the node's bandwidth, and this could be
> used to calculate the weights?  It's necessary that the consensus
> remain a deterministic function of the votes.

That's one approach.  It means however that when we want to tweak the
weighting algorithm we have to introduce new consensus methods.

The other approach is that every voter assigns weights for each of the
purposes (Exit, Guard, ..) so that their total sum is some constant X.
When building a consensus we take the median for each purpose for each
router.

> >    b) how to represent weights in the consensus?
> >       Maybe "s Guard=0.13 Exit=0.02 Middle=0.00 Stable.."
> 
> That would break backward compatibility.  Adding a new per-line
> instead would probably be better.  We should play with representations
> here till we wind up with something compressible, and we should figure
> out the space impact of doing this.

How about something like "w Exit=41 Guard=94 Middle=543"

  Consensus documents will have a new line per router similar
  to the "r", "s", and "v" lines that already exist.  This line
  will convey weight information to clients.

   "w Exit=41 Guard=94 Middle=543 ..."

  It starts with the letter w and then contains any number of Key=Value
  pairs.  Values will be non-negative integers.  Clients will pick
  routers with a propability proportional to the number for the intended
  purpose.

  Clients MUST accept sums of all weights for a given purpose over all
  routers in a consensus up to UINT64_max.

> > 3.2 Fetching descriptors on demand
> > 
> >   As described in 2.4 a descriptor lists IP address, OR- and Dir-Port,
> >   and the onion key for a server.
> > 
> >   A client already knows the IP address and the ports from the consensus
> >   documents, but without the onion key it will not be able to send
> >   CREATE/EXTEND cells for that server.  Since the client needs the onion
> >   key it needs the descriptor.
> > 
> >   If a client only downloaded a few descriptors in an observable manner
> >   then that would leak which nodes it was going to use.
> > 
> >   This proposal suggests the following:
> > 
> >   1) when connecting to a guard node for which the client does not
> >      yet have a cached descriptor it requests the descriptor it
> >      expects by hash.  (The consensus document that the client holds
> >      has a hash for the descriptor of this server.  We want exactly
> >      that descriptor, not a different one.)
> >
> >      [XXX: How?  We could either come up with a new cell type,
> >       RELAY_REQUEST_SD that takes only a hash (of the SD), or use
> >       RELAY_BEGIN_DIR.  The former is probably smarter since we will
> >       want to use it later on as well, and there we will require
> >       padding.]
> 
> My first thought was that I'd prefer to avoid multiplying machinery
> here.  When we design RELAY_REQUEST_SD, let's try to keep looking to
> see whether we can add a padding argument to RELAY_BEGIN_DIR rather
> than forcing a new relay cell type?

A new cell type safes on round trip times.  Using BEGIN_DIR means we
send the cell, get a "CONNECTED" back, send the request, get a result.

Using a dedicated cell type will cut that in half.

> Something else to figure out here is migration.  When the first cut of
> this system is done, only new servers will support RELAY_REQUEST_SD.
> This means that clients will still need to pre-download descriptors
> under some circumstances.
> 
> In fact, the rules will be pretty weird here.  If extends are done by
> first asking B for C's descriptor, then clients need to know whether B
> supports RELAY_REQUEST_SD.  If it doesn't, they need to have C's
> descriptor, which means they need to have downloaded it in advance.
> 
> In its final version, this proposal needs a migration plan.

We could fall back to BEGIN_DIR, but that only helps when B is a
directory cache.

One approach is to introduce server support first, and wait with client
use until a significant part of the network is supporting it.  Unless we
can come up with something smarter.

> > 3.4 Exit selection
> > 
> >   Currently finding an appropriate exit node for a user's request is
> >   easy for a client because it has complete knowledge of all the exit
> >   policies of all servers on the network.
> > 
> >   [XXX: I have no finished ideas here yet.
> >     - if clients only rely on the current exit flag they will
> >       a) never use servers for exit purposes that don't have it,
> >       b) will have a hard time finding a suitable exit node for
> >          their weird port that only a few servers allow.
> >     - the authorities could create a new summary document that
> >       lists all the exit policies and their nodes (by fingerprint).
> >       I need to find out how large that document would be.
> >     - can we make the "Exit" flag more useful?  can we come
> >       up with some "standard policies" and have operators pick
> >       one of the standards?
> 
> Generally, most policies should take the form of "Here are the ports I
> allow.  Here are the addresses I disallow."  If we codify a few
> port-sets, we might be in business.

Theory: Most routers use one of a very small set of different exit policies (if
we think of a router's own IP address in its exit policy as a single token
@@IP@@ or whatever).

Maybe the consensus document should include a hash of the (normalized,
i.e. the router's IP replaced with a token) exit policy.

For instance the current consensus has the following exit policies:
    750 ea118b6480d664e6b65363d6330f62d76312309d  -
    606 6c327099a4eb1ff45128d36643d7494d706e1736  -
    200 cc5e5310ca7fc1f22b0ce1d017817934889e424a  -
     38 d640a0b886b61d20f3390f16e68f94bb67ca3e02  -
     34 04e917f4dfa4fde648fa2e64ed450cf078bba22d  -
     22 24317c522385314806403543dbdc8c4d797c412e  -
     16 c6fca3e85a850ab5ebce7e579739c30077b1e967  -
     15 c1ca7a48e410911356a8fc585744fb3ffeab99da  -
     12 cc44ee0df0f2a42868e65150e11d16274f71cd4c  -
      9 5c74594ad9dd4df0bd40e1e2cd3fe7f2906657cf  -
      8 daad277346eac48f247411dc60e2020a767861f4  -
      8 63fa7a8e07fbf314abb52b19c67a6b773052d0d9  -
      7 b2b8e7d147a05258bf375407dc29abeeeda9e25f  -
      6 a5ac5094b4a3fb116fe99cebba0787609d7c99b9  -
      6 a29974dcf97a012726710ced75eab63ff95383dc  -
      6 3c0d8576d935aae8c6c296216ac559ced03a612f  -
      6 20c7b1b0914ce4f5817b58912723d5e26974a373  -
i.e. ea11.. exists 750 times, 6c.. 606 times etc.

Now when a tor client first start it will pick a route based only on the
weights it learns from the consensus.  In the course of building that
circuit it will learn a few exit policies.

Next time it needs to build a circuit it can again generate it only
based on weights.  If it is lucky it already will have the exit
policy (even if not the full SD) of the exit node.  Should that policy
not allow exits to the target that circuit can be abandoned immediately.

We will have a hard time exiting when only very few nodes allow
connections to a specific IP:port.  But maybe that's ok.

> >   This proposal still requires that all servers have the descriptors of
> >   every other node in the network in order to answer RELAY_REQUEST_SD
> >   cells.  These cells are sent when a circuit is extended from ending at
> >   node B to a new node C.  In that case B would have to answer a
> >   RELAY_REQUEST_SD cell that asks for C's server descriptor (by SD digest).
> >
> >   In order to answer that request B obviously needs a copy of C's server
> >   descriptor.  In the future we might amend RELAY_REQUEST_SD cells to
> >   contain also the expected IP address and OR-port of the server C (the
> >   client learns them from the network status document), so that B no
> >   longer needs to know all the descriptors of the entire network but
> >   instead can simply go and ask C for its descriptor before passing it
> >   back to the client.
> 
> We might want to include this information in RELAY_REQUEST_SD anyway
> now, so that when servers start supporting fetch-on-demand, clients
> will already be sending them the info they need to do it.  I think it
> should include an identity fingerprint digest too, so that B can open
> an authenticated OR connection to C as needed.

agreed.

-- 
                           |  .''`.  ** Debian GNU/Linux **
      Peter Palfrader      | : :' :      The  universal
 http://www.palfrader.org/ | `. `'      Operating System
                           |   `-    http://www.debian.org/