[tor-dev] Proposal 302: Hiding onion service clients using WTF-PAD

Tue May 21 13:14:48 UTC 2019

On 16 May (14:20:05), George Kadianakis wrote:

Hello!

> 4.1. A dive into general circuit construction sequences [CIRCCONSTRUCTION]
> 
>    In this section we give an overview of how circuit construction looks like
>    to a network or guard-level adversary. We use this knowledge to make the
>    right padding machines that can make intro and rend circuits look like these
>    general circuits.
> 
>    In particular, most general Tor circuits used to surf the web or download
>    directory information, start with the following 6-cell relay cell sequence (cells
>    surrounded in [brackets] are outgoing, the others are incoming):
> 
>      [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [BEGIN] -> CONNECTED
> 
>    When this is done, the client has established a 3-hop circuit and also
>    opened a stream to the other end. Usually after this comes a series of DATA
>    cell that either fetches pages, establishes an SSL connection or fetches
>    directory information:
> 
>      [DATA] -> [DATA] -> DATA -> DATA
> 
>    The above stream of 10 relay cells defines the grand majority of general
>    circuits that come out of Tor browser during our testing, and it's what we
>    are gonna use to make introduction and rednezvous circuits blend in.

Considering "either fetches pages,..." is in the description, I'm confused how
only 2 data cells is the grand majority?

A simple "wget torproject.org" gives me an index.html of 16KB meaning at least
32 DATA cells. Even a directory fetch can't only be 2 data cells... ?

Is this that "there will always be a minimum of 2 data cell both ways" and
thus you want to match that for HS client circuits and then send bunch of
padding to match whatever comes next on a general circuit but "at least we'll
have 10 cells like any other circuits" ?

> 5.1. Client-side introduction circuit hiding machines [INTRO_CIRC_HIDING]
> 
>    These two machines are meant to hide client-side introduction circuits. The
>    origin-side machine sits on the client and sends padding towards the
>    introduction circuit, whereas the relay-side machine sits on the middle-hop
>    (second hop of the circuit) and sends padding towards the client. The
>    padding from the origin-side machine terminates at the middle-hop and does
>    not get forwarded to the actual introduction point.
> 
>    Both of these machines only get activated for introduction circuits, and
>    only after an INTRODUCE1 cell has been sent out.
> 
>    This means that before the machine gets activated our cell flow looks like this:
> 
>     [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [INTRODUCE1]
> 
>    Comparing the above with section [CIRCCONSTRUCTION], we see that the above
>    cell sequence matches the one from general circuits up to the first 7 cells.
> 
>    However, in normal introduction circuits this is followed by an
>    INTRODUCE_ACK and then the circuit gets teared down, which does not match
>    the sequence from [CIRCCONSTRUCTION].
> 
>    Hence when our machine is used, after sending an [INTRODUCE1] cell, we also
>    send a [PADDING_NEGOTIATE] cell, which gets answered by a PADDING_NEGOTIATED
>    cell and an INTRODUCE_ACKED cell. This makes us match the [CIRCCONSTRUCTION]
>    sequence up to the first 10 cells.
> 
>    After that, we continue sending padding from the relay-side machine so as to
>    fake a directory download, or an SSL connection setup. We also want to
>    continue sending padding so that the connection stays up longer to destroy
>    the "Duration of Activity" fingerprint.

I've looked at the implementation quickly and these DROP cells aren't
accounted for in our circuit flow control which means that there will be a
difference between a "real" DATA circuit and a circuit being sent PADDING in
order to look like the former. And that will be the flow control cell(s)
(SENDME) coming back from the end point that is receiving the data.

In other words, one circuit (the padded one) will have only a long stream of
cells going in one direction and the second circuit (with legit data) will
have that long stream but now and then a cell coming back down the circuit.

I believe this is quite the distinguisher between any circuit seeing much
padding and one that doesn't? :S

> 
>    To calculate the padding overhead, we see that the origin-side machine just
>    sends a single [PADDING_NEGOATIATE] cell, wheras the origin-side machine

Typo here "PADDING_NEGOATIATE".

>    sends a PADDING_NEGOTIATED cell and between 7 to 10 DROP cells. This means
>    that the average overhead of this machine is 11 padding cells.
> 
>    In terms of WTF-PAD terminology, these machines have three states (START,
>    OBF, END). They move from the START to OBF state when the first
>    non-padding cell is received on the circuit, and they stay in the OBF
>    state until all the padding gets depleted. The OBF state is controlled by
>    a histogram which specifies the parameters described in the paragraphs
>    above. After all the padding finishes, it moves to END state.
> 
>    We also set a special WTF-PAD flag which keeps the circuit open even after
>    the introduction is performed. In particular, with this feature the circuit
>    will stay alive for the same durations as normal web circuits before they
>    expire (usually 10 minutes).

I would make sure that the implentation here flags the circuit "Unusable"
after an introduction since if a client just repicks it to introduce again
(let say a second SOCKS connection with a different user/pass), then the intro
point will immediately tear it down rendering this "keep open" feature a bit
pointless :(.

Cheers!
David

-- 
RvcA5t4gf8ZVGWkeAH8q2YX6s5pRuadzbdJisXSBhfA=
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20190521/59050f4e/attachment.sig>