[tor-dev] Walking Onions status update: week 2 notes

Fri Mar 20 12:59:46 UTC 2020

On Sat, Mar 14, 2020 at 12:44 AM teor <teor at riseup.net> wrote:
>
> Hi Nick,
>
> I'm interested in following along with Walking Onions, but I might
> drop out when the relay IPv6 work gets busy.
>
> I'm not sure how you'd like feedback, so I'm going to try to put it
> in emails, or in pull requests.
>
> (I made one comment on a git commit in walking-onions-wip, but
> I'm not sure if you see those, so I'll repeat it here.)

Thanks, this is really helpful!  I missed the repository comments, and
I'll probably miss some more.

> > On 14 Mar 2020, at 03:52, Nick Mathewson <nickm at torproject.org> wrote:
> >
> > This week, I worked specifying the nitty-gritty of the SNIP and
> > ENDIVE document formats.  I used the CBOR meta-format [CBOR] to
> > build them, and the CDDL specification language [CDDL] to specify
> > what they should contain.
> >
> > As before, I've been working in a git repository at [GITHUB]; you
> > can see the document I've been focusing on this week at
> > [SNIPFMT].  (That's the thing to read if you want to send me
> > patches for my grammar.)
>
> I'm not sure if you've got to exit ports yet, but here's one possible
> way to partition ports:
> * choose large partitions so that all exits support all ports in the
>   partition
> * choose smaller categories so that most exits support most ports
>   in the partition
> * ignore small partitions, they're bad for client privacy anyway
>
> For example, you might end up with:
> * web (80 & 443)
> * interactive (SSH, IRC, etc.)
> * bulk (torrent, etc.)
> * default exit policy
> * reduced exit policy
>
> I'm not sure if we will want separate categories for IPv4-only
> and dual-stack policies. We can probably ignore IPv6-only
> policies for the moment, but we should think about them in
> future.

Interesting!  Yeah, something like this might work.  I've added this
to my notes.

Also, there's some interesting ideas about handling exit policies in
the whitepaper's section 6.  I don't know if

> > There were a few neat things to do here:
> >
> >   * I had to define SNIPs so that clients and relays can be
> >     mostly agnostic about whether we're using a merkle tree or a
> >     bunch of signatures.
> >
> >   * I had to define a binary diff format so that relays can keep
> >     on downloading diffs between ENDIVE documents. (Clients don't
> >     download ENDIVEs).  I did a quick prototype of how to output
> >     this format, using python's difflib.
>
> Can we make the OrigBytesCmdId use start and length?
> length may be shorter than end, and it will never be longer.

Good idea; done.

> If we are doing chunk-based encoding, we could make start
> relative to the last position in the original file. But that would
> mean no back-tracking, which means we can't use some
> more sophisticated diff algorithms.

Well, we could allow signed integers.  I'm making a note to look into
whether this would help much.

[...]
> If the issue is having multiple valid ENDIVEs, then authorities could
> also put a cap on the number of concurrently valid ENDIVEs.
>
> There are two simple schemes to implement a cap:
> * set a longer interval for rebuilding all ENDIVEs
>   (the cap is the rebuild interval, divided by the validity interval)
> * refuse to sign a new SNIP for a relay that's rapidly changing
>   (or equivalently, leave that relay out of the next ENDIVE)
>
> Both these schemes also limit the amount of bandwidth used
> for a relay that's rapidly changing details.

Interesting idea; I think in the case of the first one, we'd be giving
up something important, but I don't know how much so.  The second one
might actually help with our network stability, though.

[...]
>
> Do "tricky restrictions" include the IP subnet restriction (avoid
> relays in the same IPv4 /16 and IPv6 /32) ?

I'm thinking of _all_ tricky restrictions, including but not limited
to IP subnets, families, user settings, and

> What about a heterogenous IPv4 / IPv6 network, where
> IPv4-only relays can't connect to IPv6-only relays?

This one would fit more into "alternative topologies", but I think the
design can handle that.  (See proposal 300 section 3.9.)

The way it would work is, you put IPv4-only relays into group A,
dual-stack relays in group B, and IPv6 relays into group C.

Then you give them different successor lists, so that A has successor
in A and B, C has successors in B and C, and B can have all
successors.

> If we do decide to add IPv6-only relays, we'll probably add
> them in this order:
> * IPv6-only bridges (needs dual-stack bridge guards / middles?)
> * IPv6-only exits (needs dual-stack middles)
> * IPv6-only guards (needs dual-stack middles)
> * IPv6-only middles (needs dual-stack or IPv6-only guards and
>    exits, removes need for dual-stack middles)
>
> What about bridge guards?
> (That is, can bridges add an extra hop into circuits, to protect
> themselves from being discovered by middles?)

Yes, that should still work with the base design.  I'll need to think
more about how it would work in non-clique topologies, though.