[tor-dev] Proposal 302: Hiding onion service clients using WTF-PAD
George Kadianakis
desnacked at riseup.net
Mon May 27 10:41:00 UTC 2019
David Goulet <dgoulet at torproject.org> writes:
> On 16 May (14:20:05), George Kadianakis wrote:
>
> Hello!
>
>> 4.1. A dive into general circuit construction sequences [CIRCCONSTRUCTION]
>>
>> In this section we give an overview of how circuit construction looks like
>> to a network or guard-level adversary. We use this knowledge to make the
>> right padding machines that can make intro and rend circuits look like these
>> general circuits.
>>
>> In particular, most general Tor circuits used to surf the web or download
>> directory information, start with the following 6-cell relay cell sequence (cells
>> surrounded in [brackets] are outgoing, the others are incoming):
>>
>> [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [BEGIN] -> CONNECTED
>>
>> When this is done, the client has established a 3-hop circuit and also
>> opened a stream to the other end. Usually after this comes a series of DATA
>> cell that either fetches pages, establishes an SSL connection or fetches
>> directory information:
>>
>> [DATA] -> [DATA] -> DATA -> DATA
>>
>> The above stream of 10 relay cells defines the grand majority of general
>> circuits that come out of Tor browser during our testing, and it's what we
>> are gonna use to make introduction and rednezvous circuits blend in.
>
> Considering "either fetches pages,..." is in the description, I'm confused how
> only 2 data cells is the grand majority?
>
> A simple "wget torproject.org" gives me an index.html of 16KB meaning at least
> 32 DATA cells. Even a directory fetch can't only be 2 data cells... ?
>
Perhaps I should have made it more clear but the pattern:
[DATA] -> [DATA] -> DATA -> DATA -> ...
comes from the SSL handshake that happens in most general circuits. In
particular the first two [DATA] cells are the ClientHello etc. SSL
records that get sent by the client, and then the subsequence DATA cells
are the ServerHello etc. of the server.
>> 5.1. Client-side introduction circuit hiding machines [INTRO_CIRC_HIDING]
>>
>> These two machines are meant to hide client-side introduction circuits. The
>> origin-side machine sits on the client and sends padding towards the
>> introduction circuit, whereas the relay-side machine sits on the middle-hop
>> (second hop of the circuit) and sends padding towards the client. The
>> padding from the origin-side machine terminates at the middle-hop and does
>> not get forwarded to the actual introduction point.
>>
>> Both of these machines only get activated for introduction circuits, and
>> only after an INTRODUCE1 cell has been sent out.
>>
>> This means that before the machine gets activated our cell flow looks like this:
>>
>> [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [EXTEND2] -> EXTENDED2 -> [INTRODUCE1]
>>
>> Comparing the above with section [CIRCCONSTRUCTION], we see that the above
>> cell sequence matches the one from general circuits up to the first 7 cells.
>>
>> However, in normal introduction circuits this is followed by an
>> INTRODUCE_ACK and then the circuit gets teared down, which does not match
>> the sequence from [CIRCCONSTRUCTION].
>>
>> Hence when our machine is used, after sending an [INTRODUCE1] cell, we also
>> send a [PADDING_NEGOTIATE] cell, which gets answered by a PADDING_NEGOTIATED
>> cell and an INTRODUCE_ACKED cell. This makes us match the [CIRCCONSTRUCTION]
>> sequence up to the first 10 cells.
>>
>> After that, we continue sending padding from the relay-side machine so as to
>> fake a directory download, or an SSL connection setup. We also want to
>> continue sending padding so that the connection stays up longer to destroy
>> the "Duration of Activity" fingerprint.
>
> I've looked at the implementation quickly and these DROP cells aren't
> accounted for in our circuit flow control which means that there will be a
> difference between a "real" DATA circuit and a circuit being sent PADDING in
> order to look like the former. And that will be the flow control cell(s)
> (SENDME) coming back from the end point that is receiving the data.
>
> In other words, one circuit (the padded one) will have only a long stream of
> cells going in one direction and the second circuit (with legit data) will
> have that long stream but now and then a cell coming back down the circuit.
>
> I believe this is quite the distinguisher between any circuit seeing much
> padding and one that doesn't? :S
>
I think you are right, but I dont think that these padded intro circuits
will stay open for long enough to need a SENDME cell from the client to
the relay. In particular, the client will receive about 15 cells before
the intro circuit gets teared down.
>>
>> To calculate the padding overhead, we see that the origin-side machine just
>> sends a single [PADDING_NEGOATIATE] cell, wheras the origin-side machine
>
> Typo here "PADDING_NEGOATIATE".
>
Yep. Will fix soon.
>> sends a PADDING_NEGOTIATED cell and between 7 to 10 DROP cells. This means
>> that the average overhead of this machine is 11 padding cells.
>>
>> In terms of WTF-PAD terminology, these machines have three states (START,
>> OBF, END). They move from the START to OBF state when the first
>> non-padding cell is received on the circuit, and they stay in the OBF
>> state until all the padding gets depleted. The OBF state is controlled by
>> a histogram which specifies the parameters described in the paragraphs
>> above. After all the padding finishes, it moves to END state.
>>
>> We also set a special WTF-PAD flag which keeps the circuit open even after
>> the introduction is performed. In particular, with this feature the circuit
>> will stay alive for the same durations as normal web circuits before they
>> expire (usually 10 minutes).
>
> I would make sure that the implentation here flags the circuit "Unusable"
> after an introduction since if a client just repicks it to introduce again
> (let say a second SOCKS connection with a different user/pass), then the intro
> point will immediately tear it down rendering this "keep open" feature a bit
> pointless :(.
>
I think this is already the case because we repurpose these "keep-alive"
circuits as a separate circuit purpose (CIRCUIT_PURPOSE_C_PADDING), and
hence they should not be re-used as intro circuits by the client.
I should check again tho.
Thanks for the feedback! :)
Will send a fresh version of the proposal back to the ML soon!
More information about the tor-dev
mailing list