[tor-dev] Proposal 247: Alternate Path Lengths

Thu Jan 21 14:13:35 UTC 2016

Mike Perry <mikeperry at torproject.org> writes:

> George Kadianakis:
> > Mike Perry <mikeperry at torproject.org> writes:
> >
> > <snip>
> >
> > I have mixed feelings about this.
> >
> > - If client guard discovery is the main reason we are doing this, I think
> > we
> >   should first look into these guard discovery vectors individually and
> > figure
> >   out how concerning they are and if there is anything else we can do to
> > block
> >   them,
>
> I agree this is worthwhile, if only to better understand the design
> space.  However, I think we're going to find that most applications we
> envision can be induced into violating many of the ad-hoc mitigations we
> try to bake in.
>

OK. Let's see. I feel that these guard discovery attacks can be blocked with:

a) If an IP listed on an HS descriptor tells you that it doesn't know the HS,
   then ignore it for this hidden service today.

b) If an HSDir that should have an HS descriptor tells you that it doesn't have
   it, then don't ask it again this hour.

I think we do both checks right now in the Tor codebase and we also have caches
so that we don't retry the same nodes. If we are serious, we could even write
those caches on disk.

I feel that if an application restarts Tor or flushes those caches because a
hidden service does not work, then the application is doing it wrong.

Also even with client vanguards I think the checks above will still have to be
implemented. I could imagine an application that flushes all the DataDirectory
if the hidden service stops working, and then even vanguards won't save them.

In general, I'm not sure how much sanity we can assume from third-party
applications.

> > before complicating path selection even more.
>
> I feel like you're actually going to end up complicating the
> implementation more with this position. If we have to have separate path
> selection modes for service side and client side, we then have to
> maintain three different path selection mechanisms in Tor: normal exit,
> onion services, and onion clients.
>
> If we gave the same options for both hidden services and clients, we are
> at least down to two systems (exit vs non-exit), with some minor options
> for each.
>

Hmmm maybe. But onion clients would look very much like normal exit, but they
would connect to RPs/IPs, instead of exits. Just like the code is now.

Also, with vanguards if we end up doing something like:

        HSDir: C - L - S - E - HSDir
                IP: C - L - S - E - IP
                        Rend: C - L - M - RP -- S - M - L - HS

we have three different path types here. We would need to write very beautiful
interfaces if we want this to be done by the same code.

> > - Also, I like symmetry myself, but I wouldn't change path selection and
> >   security just for that _if I can help it_.
> >
> > <snip>
> >
> > >
> > >
> > > Hsdir post/fetch:
> > >   1. C - L - M - S - E - HC - L - M - S - E - H
> > >   2. C - L - S - E - H
> > >   3. C - L - S - H
> > >
> > > Intro:
> > >   1. C - L - M - S - E -- I   - S - M - L - H
> > >   2. C - L - S - E     -- I   - S - L - H
> > >  *3. C - L - S         -- I&S - L - H     (* IP Intersection attack!)
> > >
> > > Rend:
> > >   1. C - L - M - S - R -- E - S - M - L - H
> > >   2. C - L - S - R     -- E - S - L - H
> > >   3. C - L - R&S       -- S - L - H
> > >
> >
> > What is R&S is here? Clients use static short-lifespan rendezvous points?
>
> Yes. Similarly for I&S (which we should not do - it's bad in every
> variation of Vanguards).
>
> I don't see any such problems with R&S though, since R is not associated
> with any publicly viewable information, I don't think it is as big of a
> problem. At best its a linkability risk for the client. But maybe I
> missed something.
>

Hmm, the only problem I can see here is that the R&S can link clients based on
the L node. So for example, in the crazy edge case where only one client
conncets to hidden services through R&S over L, then R&S could count "Ah this
client has done 42 rendezvous through me in the past 5 hours". And if that's a
ricochet client with 42 contacts maybe it's a selector. But I think this is a
pretty far fetched example...

Another _big_ gotcha here is that let's say we end up doing:

        HSDir: C - L - M - S - E - HSDir
                IP: C - L - M - S - E - IP
                        Rend: C - L - S - RP -- S - M - L - HS

and all the 'S' nodes are taken from the same pool, then the 'L' node will be
able to learn 'M' by looking at the IP circuits, and learn 'S' by looking at
the
rend circuit. So it will basically be able to derive the full circuit.

We need to be very careful about which paths we pick, and which "guardsets" we
get the nodes from.

> > > Looking at these, we can see that we sacrifice the middle guards in the
> > > second option, which will come at the cost of one less compromise attack
> > > (but still the need to compromise the long-lived guard). We also lose
> > > the unlinkability in the third option, and this actually bites us in
> > > Intro 3: the hidden service L guard can perform a long-term intersection
> > > attack, watching for published intro points and matching that to the
> > > circuits that H makes to them. So that path length probably should not
> > > be used.
>
> <snip>
>
> > However, I still have mixed feelings about changing client path selection
> > as
> > part of proposal 247:
> >
> > - My main issue is that I think figuring out the right client path
> > selection
> >   will require a _heavy_ amount of security analysis that will delay
> > prop247
> >   even more.  I was hoping that we could treat the client-side as an
> > orthogonal
> >   problem and tackle it in the future separately. But maybe I'm totally
> > wrong
> >   and should be more patient and these two problems should be handled
> > together.
>
> I think patience is best, because if we don't understand this problem
> really well, we're liable to miss something. Or cement ourselves off
> from a potential future of interactive HS voice+video. Neither one is a
> great failure mode.
>

Agreed.

> I think for many applications (esp the browser and ricochet), we're
> going to find that we need to protect the client just as much as the
> server.
>
> > - If the above changes only happen to HS circuits, we make it harder to
> > make HS
> >   circuits indistinguishable from normal circuits on the face of traffic
> >   analysis. But maybe we have already lost this game.
>
> We already lost that game until we have multihop padding. Proposal
> 247 already outlines how to use it in section 4.1 to help conceal
> vanguard usage.
>
> It is also worth pointing out that if we fail to conceal the HS vanguard
> fingerprint entirely with padding, it will be especially valuable to
> have more than just 30k service-side instances with the vanguard
> fingerprint. Far better to have all the clients in that anonymity set,
> too, I think.
>

Yes that's true. This seems to be the main argument for doing client vanguards
right now for me.

However, to actually achieve any sort of confusion here, we need to ensure that
the paths between clients and HSes are symmetric. So for example if we end up
doing:

    C - L - S - E -- IP  - S - M - L - H

then the L guard could distinguish clients from HSes by looking at whether the
second hop is short lived ('S') or medium lived ('M').

Woohoo! Anonymity!