[tor-dev] Proposal 302: Hiding onion service clients using WTF-PAD

Mon May 27 10:41:00 UTC 2019

David Goulet <dgoulet at torproject.org> writes:

> On 16 May (14:20:05), George Kadianakis wrote:
>
> Hello!
>
>> 4.1. A dive into general circuit construction sequences [CIRCCONSTRUCTION]
>> 
>>    In this section we give an overview of how circuit construction looks like
>>    to a network or guard-level adversary. We use this knowledge to make the
>>    right padding machines that can make intro and rend circuits look like these
>>    general circuits.
>> 
>>    In particular, most general Tor circuits used to surf the web or download
>>    directory information, start with the following 6-cell relay cell sequence (cells
>>    surrounded in [brackets] are outgoing, the others are incoming):
>> 
>>      [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [BEGIN] -> CONNECTED
>> 
>>    When this is done, the client has established a 3-hop circuit and also
>>    opened a stream to the other end. Usually after this comes a series of DATA
>>    cell that either fetches pages, establishes an SSL connection or fetches
>>    directory information:
>> 
>>      [DATA] -> [DATA] -> DATA -> DATA
>> 
>>    The above stream of 10 relay cells defines the grand majority of general
>>    circuits that come out of Tor browser during our testing, and it's what we
>>    are gonna use to make introduction and rednezvous circuits blend in.
>
> Considering "either fetches pages,..." is in the description, I'm confused how
> only 2 data cells is the grand majority?
>
> A simple "wget torproject.org" gives me an index.html of 16KB meaning at least
> 32 DATA cells. Even a directory fetch can't only be 2 data cells... ?
>

Perhaps I should have made it more clear but the pattern:

        [DATA] -> [DATA] -> DATA -> DATA -> ...

comes from the SSL handshake that happens in most general circuits. In
particular the first two [DATA] cells are the ClientHello etc. SSL
records that get sent by the client, and then the subsequence DATA cells
are the ServerHello etc. of the server.

>> 5.1. Client-side introduction circuit hiding machines [INTRO_CIRC_HIDING]
>> 
>>    These two machines are meant to hide client-side introduction circuits. The
>>    origin-side machine sits on the client and sends padding towards the
>>    introduction circuit, whereas the relay-side machine sits on the middle-hop
>>    (second hop of the circuit) and sends padding towards the client. The
>>    padding from the origin-side machine terminates at the middle-hop and does
>>    not get forwarded to the actual introduction point.
>> 
>>    Both of these machines only get activated for introduction circuits, and
>>    only after an INTRODUCE1 cell has been sent out.
>> 
>>    This means that before the machine gets activated our cell flow looks like this:
>> 
>>     [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [INTRODUCE1]
>> 
>>    Comparing the above with section [CIRCCONSTRUCTION], we see that the above
>>    cell sequence matches the one from general circuits up to the first 7 cells.
>> 
>>    However, in normal introduction circuits this is followed by an
>>    INTRODUCE_ACK and then the circuit gets teared down, which does not match
>>    the sequence from [CIRCCONSTRUCTION].
>> 
>>    Hence when our machine is used, after sending an [INTRODUCE1] cell, we also
>>    send a [PADDING_NEGOTIATE] cell, which gets answered by a PADDING_NEGOTIATED
>>    cell and an INTRODUCE_ACKED cell. This makes us match the [CIRCCONSTRUCTION]
>>    sequence up to the first 10 cells.
>> 
>>    After that, we continue sending padding from the relay-side machine so as to
>>    fake a directory download, or an SSL connection setup. We also want to
>>    continue sending padding so that the connection stays up longer to destroy
>>    the "Duration of Activity" fingerprint.
>
> I've looked at the implementation quickly and these DROP cells aren't
> accounted for in our circuit flow control which means that there will be a
> difference between a "real" DATA circuit and a circuit being sent PADDING in
> order to look like the former. And that will be the flow control cell(s)
> (SENDME) coming back from the end point that is receiving the data.
>
> In other words, one circuit (the padded one) will have only a long stream of
> cells going in one direction and the second circuit (with legit data) will
> have that long stream but now and then a cell coming back down the circuit.
>
> I believe this is quite the distinguisher between any circuit seeing much
> padding and one that doesn't? :S
>

I think you are right, but I dont think that these padded intro circuits
will stay open for long enough to need a SENDME cell from the client to
the relay. In particular, the client will receive about 15 cells before
the intro circuit gets teared down.

>> 
>>    To calculate the padding overhead, we see that the origin-side machine just
>>    sends a single [PADDING_NEGOATIATE] cell, wheras the origin-side machine
>
> Typo here "PADDING_NEGOATIATE".
>

Yep. Will fix soon.

>>    sends a PADDING_NEGOTIATED cell and between 7 to 10 DROP cells. This means
>>    that the average overhead of this machine is 11 padding cells.
>> 
>>    In terms of WTF-PAD terminology, these machines have three states (START,
>>    OBF, END). They move from the START to OBF state when the first
>>    non-padding cell is received on the circuit, and they stay in the OBF
>>    state until all the padding gets depleted. The OBF state is controlled by
>>    a histogram which specifies the parameters described in the paragraphs
>>    above. After all the padding finishes, it moves to END state.
>> 
>>    We also set a special WTF-PAD flag which keeps the circuit open even after
>>    the introduction is performed. In particular, with this feature the circuit
>>    will stay alive for the same durations as normal web circuits before they
>>    expire (usually 10 minutes).
>
> I would make sure that the implentation here flags the circuit "Unusable"
> after an introduction since if a client just repicks it to introduce again
> (let say a second SOCKS connection with a different user/pass), then the intro
> point will immediately tear it down rendering this "keep open" feature a bit
> pointless :(.
>

I think this is already the case because we repurpose these "keep-alive"
circuits as a separate circuit purpose (CIRCUIT_PURPOSE_C_PADDING), and
hence they should not be re-used as intro circuits by the client.

I should check again tho.

Thanks for the feedback! :)
Will send a fresh version of the proposal back to the ML soon!