[tor-dev] WTF-PAD and the future

Sun Jul 29 13:42:43 UTC 2018

Mike Perry <mikeperry at torproject.org> writes:

> George Kadianakis:
>> Hello Mike,
>> 
>> I had a talk with Marc and Mohsen today about WTF-PAD. I now understand
>> much more about WTF-PAD and how it works with regards to histograms.  I
>> think I might even understand enough to start some sort of conversation
>> about it:
>> 
>> Here are some takeaways:
>> 
>> 1) Marc and Mohsen think that WTF-PAD might not be the way forward
>>    because of its various drawbacks and its complexity. Apparently there
>>    are various attacks on WTF-PAD that Roger has discovered (SENDME
>>    cells side-channels?) and also the deep learning crowd has done some
>>    pretty good damage to the WTF-PAD padding (90%-60% accuracy?). They
>>    also told me that achieving needed precision on the timings might be
>>    a PITA.
>
> Are there citations for any of this? Last I heard Matt Wright was
> working on a deep learning study but the results were mixed.
>

I think this is the best we have in terms of public results:
  https://arxiv.org/abs/1801.02265

>> 2) From what I understand you are also hoping to use WTF-PAD to protect
>>    against circuit fingerprinting and not just website
>>    fingerprinting. They told me that while this might be plausible,
>>    there is no current research on how well it can achieve that.  Are we
>>    hoping to do that? And what research remains here? How can I help?
>>    Which parts of the Tor circuit protocol are we hoping to hide?
>
> I am designing WTF-PAD to be a framework for deploying padding against
> arbitrary traffic analysis attacks. It is meant to allow us to define
> histograms on the fly (in the Tor consensus) as these are studied. The
> fact that they have not yet been studied is not super relevant to
> deploying the framework for it now.
>

ACK.

What other traffic analysis attacks are we looking at addressing here?

I'm thinking of stuff like "circuit fingerprinting of onion services",
but I wonder if histograms and random sampling is too crude to actually
be able to help against sophisticated attacks. I don't have a suggestion
for something better currently.

On that topic, is it decided whether the adaptive padding of WTF-PAD
will also happen during circuit construction, or only after that?

>> 3) Marc and Mohsen suggested using application-layer defences because
>>    the application-layer has much better view of the actual structures
>>    that are sent on the wire, instead of the black box view that the
>>    network layer has.
>> 
>>    In particular they were mainly concerned about onion services
>>    fingerprinting because they are part of a restricted closed world,
>>    whereas they were less concerned about the entire internet because of
>>    its vast size.
>> 
>>    They suggested that we could investigate using the service-side
>>    "alpaca" library for onion services (e.g. as part of securedrop?)
>>    which should resolve the most pressing concern of HS identification.
>
> I mean yeah application-layer defenses are useful for website traffic
> fingerprinting, but that is a very narrow slice of the traffic analysis
> problems that I want this framework to solve.
>
> WTF-PAD also doesn't rule out hidden service operators using alpaca,
> either. 
>

Agreed.

>> 4) They also told me of research by Tobias Pulls which eliminates the
>>    needs for histograms in WTF-PAD and instead it samples from the
>>    probability distribution directly. They think that this can simplify
>>    things somewhat. Any thoughts on this?
>
> Yes this is actually exactly what I want to do with the next iteration
> of WTF-PAD! The question is what form/model to use for these probability
> distributions. Right now we're encoding inter-burst and inter-packet
> timings with some weird geometric distribution determining how long
> these bursts should go on for, when it might be more natural to encode
> and sample from length-based distributions/histograms.
>
> (Histograms vs distribution is not the problem -- its what they encode
> and how they encode it that matters).
>
> I don't see this paper on Tobias's website. Is it up anywhere yet?
>  

Hmm. Looking at the README of wtfpad (see the APE section), I think this
blog post is the best resource we have on this:
     https://www.cs.kau.se/pulls/hot/thebasketcase-ape/