Proposal 158 revised: Microdescriptors again

Sat Jun 13 07:23:41 UTC 2009

On Sat, May 16, 2009 at 12:46:40AM -0400, Nick Mathewson wrote:
> Filename: 158-microdescriptors.txt
> Title: Clients download consensus + microdescriptors

We should do this one too, also in the 0.2.2.x timeframe. The microdesc
flavor can be the test case for consensus flavors.

> 1. Overview
> 
>   This proposal replaces section 3.2 of proposal 141, which was
>   called "Fetching descriptors on demand". Rather than modifying the
>   circuit-building protocol to fetch a server descriptor inline at each
>   circuit extend, we instead put all of the information that clients need
>   either into the consensus itself, or into a new set of data about each
>   relay called a microdescriptor. The microdescriptor is a direct
>   transform from the relay descriptor, so relays don't even need to know
>   this is happening.

This last sentence is false, now that authorities serve microdescs and
relays cache and serve them. A replacement sentence might be "".

> 3. Design
> 
>   There are three pieces to the proposal. First, authorities will list in
>   their votes (and thus in the consensus)  the expected hash
>   of microdescriptor for each relay. Second, directory mirrors will serve
>   microdescriptors. Third, clients will ask for them and cache them.

  There are three pieces to the proposal. First, authorities will list
  in their votes (and thus in the consensus) the expected hash
  of microdescriptor for each relay. Second, authorities will serve
  microdescriptors, and directory mirrors will cache and serve them.
  Third, clients will ask for them and cache them.

> 3.1. Consensus changes
> 
>   If the authorities choose a consensus method of a given version or
>   later, a microdescriptor format is implicit in that version.
>   A microdescriptor should in every case be a pure function of the
>   router descriptor and the conensus method.

s/conensus/consensus/

>   In votes, need to include the hash of each expected microdescriptor in
>   the routerstatus section. I suggest a new "m" line for each stanza,
>   with the base64 of the SHA256 hash of the router's microdescriptor.
> 
>   For every consensus method that an authority supports, it includes a
>   separate "m" line in each router section of its vote, containing:
>     "m" SP methods SP digest NL
>   where methods is a comma-separated list of the consensus methods
>   that the authority believes will produce "digest".

Learning from our lessons in proposal 162, shouldn't the m line in the
vote be
  "m" SP methods 1*(SP AlgorithmName "=" Digest) NL
?

On the one hand, it isn't critical to plan this part ahead of time,
since only the authorities would need to update in order to understand
a new voting format. But on the other hand, when the time does come,
a) we'd need a flag hour for authorities to recognize the new format,
and those suck, and b) do we really want to be designing and deploying
this part on the day we finally realize we need to switch hashes? :)

(I think it's fine for the consensus itself to not specify any
algorithmnames; that's what flavors are for. But since we don't have vote
flavors, the votes need to say which algorithms produce which digests,
so the authorities can make all the consensus flavors.)

> 3.1.1. Descriptor elements to include for now
> 
>   In the first version, the microdescriptor should contain the
>   onion-key element and the family element from the router descriptor.

Should we put the exit policy summary here too, and take it out of the
main consensus? I don't think anybody actually uses it from the main
consensus yet.

> 3.1.2. Computing consensus for microdescriptor-elements and "m" lines
> 
>   When we generating a consensus, we use whichever m line
>   unambiguously corresponds to the descriptor digest that will be
>   included in the consensus.  (If there are multiple m lines for that
>   descriptor digest, we use whichever is most common.  If they are
>   equally common, we break ties in the favor of the lexically
>   earliest.  Either way, we should log a warning: That's likely a
>   bug.)

I don't understand the above. The microdescriptor is a straight
function of a) the consensus method, and b) whichever relay descriptor
is a winner for this vote. So that means the "m" line we pick for the
consensus has to be the microdescriptor digest from whichever votes a)
voted for the winner, and b) offered the consensus method that we chose
to use for constructing this consensus. If any such votes differ, we'll
have other problems, like disagreeing authorities serving disagreeing
microdescriptors. I guess that's what you're getting at above?

>   We still need to descide whether to move ports into microdescriptors
>   or not.  In either case, they can be removed from the current "ns"
>   flavor of consensus, since no current clients use them, and they
>   take up about 5% of the compressed consensus.

Ah, here it is. Yes, I think we should put them in.

> 3.2. Directory mirrors serve microdescriptors
> 
>   Directory mirrors should then read the microdescriptor-elements line
>   from the consensus, and learn how to answer requests. (Directory mirrors
>   continue to serve normal relay descriptors too, a) to serve old clients
>   and b) to be able to construct microdescriptors on the fly.)

  That's wrong now. New text:

  Directory mirrors should then fetch, cache, and serve each
  microdescriptor from the authorities. (They need to continue to serve
  normal relay descriptors too, to handle old clients.)

>   The microdescriptors with base64 hashes <D1>,<D2>,<D3> should be available at:
>     http://<hostname>/tor/micro/d/<D1>-<D2>-<D3>.z
>   (We use base64 for size and for consistency with the consensus
>   format. We use -s instead of +s to separate these items, since
>   the + character use used in base64 encoding.)

s/use used/is used/

>   All the microdescriptors from the current consensus should also be
>   available at:
>     http://<hostname>/tor/micro/all.z

How will these two URLs interact with future flavors? If a later flavor
uses a different hash function, do we still offer everything under
/tor/micro/d/<D>, even though different clients are verifying results
with different hash functions?

And which microdescriptors get put in /tor/micro/all if there are two
active flavors?

One answer is to make it /tor/micro/F/d/<D1> and /tor/micro/F/all.
So long as we don't change our hash function, that should do it.

(In practice, implementations can realize that /F1/d/D1 and /F2/d/D1
are the same and don't need to be stored in duplicate.)

If we do change our hash function, caches that don't recognize the flavor
won't be able to verify that the microdescriptor hashes are correct. And
since they won't know whether we've changed the hash function in a
flavor they don't recognize, does that mean that caches should never
check hashes on flavors they don't recognize?

>   so a client that's bootstrapping doesn't need to send a 70KB URL just
>   to name every microdescriptor it's looking for.

That's a 110KB URL if we use SHA256. :)

> 3.3.1. Information leaks from clients
> 
>   If a client asks you for a set of microdescs, then you know she didn't
>   have them cached before. How much does that leak? What about when
>   we're all using our entry guards as directory guards, and we've seen
>   that user make a bunch of circuits already?
> 
>   Fetching "all" when you need at least half is a good first order fix,
>   but might not be all there is to it.
> 
>   Another future option would be to fetch some of the microdescriptors
>   anonymously (via a Tor circuit).

Other crazy options include doing decoy fetches, so somebody seeing a
fetch can't conclude much from it.

> 4. Transition and deployment
> 
>   Phase one, the directory authorities should start voting on
>   microdescriptors and microdescriptor elements, and putting them in the

s/ and microdescriptor elements//

>   consensus.

Thanks!
--Roger