[tor-dev] TOP SECRET BULLETIN ABOUT THE RACCOON EFFECT
Mike Perry
mikeperry at torproject.org
Wed Dec 23 04:15:19 UTC 2020
On 12/22/20 7:58 PM, The23rd Raccoon wrote:> Recent advances in traffic
analysis defenses have finally proved that my
> controversial but revolutionary theories[5,6] are valid, overturning
> decades of theories about traffic analysis attacks against anonymity
> networks!
>
> Ok ok, so I might have made a mistake in the math[7]. But I'm just a
> Raccoon who reads discarded academic research papers in a dumpster.
> While I have been highly educated through my dumpster schooling, one
> can't expect raccoons to do math correctly. Such math is best left to
> others who can properly express my theory in terms of equations.
I am glad you liked the papers!
> Others like Panchenko, Pulls, Danezis, Kadianakis, et al; and maybe
> Perry. (But probably not Perry.)
Hey, I can MATH!
As Tor's Research Janitor, I confirm that your bulletin contains valid
novel ideas, and they are very testable (see below). This is insanely
great! I wish I thought of it!
In fact, unification of correlation and fingerprinting, along with the
unification and combination of defenses, is an entire research area,
with many possible paper topics.
> Recently, Panchenko et al found traffic splitting to be highly effective
> against state of the art Website Traffic Fingerprinting attacks based on
> deep learning[24].
>
> Concurrently, Tobias Pulls used Perry and Kadianakis's padding machines
> in an optimization problem, using a Genetic Algorithm to evolve optimal
> padding machines against deep learning classifiers, for use in defense
> against Website Traffic Fingerprinting[25]. With this result, we have
> finally entered the age of the machines versus the machines. Raccoon
> math, while groundbreaking, is no longer necessary.
>
> Both of these defenses were highly successful on their own.
Pulls's methodology in your reference[25] was exemplary. Using the
circpad simulator and the circpad frameworks allows us to rapidly and
directly deploy exact research solutions on the Tor network, as-is.
In fact, we could deploy the GA-generated machine specifications in his
paper on live Tor relays today.
We will need to re-tune everything once congestion control and conflux
is deployed, and when timing is involved, so I think the best plan is to
have another round or two of research into optimizing and tuning for
that scenario.
> However, with the combination of traffic splitting and cover traffic
> defenses, Tor will be on the CUTTING EDGE of making a PARADIGM SHIFT in
> its threat model, to tackle the hardest problem of all: END-TO-END
> TRAFFIC CORRELATION.
I also agree that the combination should require less overhead and
better performance than either one by themselves. Obviously, testing
this is a very promising research area. I encourage full collaboration
between Pulls, Panchenko, Tor, wild raccoons, and others, in this area.
For those who are considering studying this, see:
https://gitlab.torproject.org/mikeperry/torspec/-/blob/ticket40202_01/proposals/329-traffic-splitting.txt
We are optimizing that using congestion control, to achieve high-speed
low-latency traffic splitting, to exit relays and onion services. We
will likely only use 2 circuits, to reduce exposure to guard relays with
respect to other potential attacks, so some padding overhead will likely
still be necessary.
The combination could also be tuned to help reduce the overhead needed
by padding, in an optimization problem context, like Pulls's GA.
I will be updating that draft with more information as the proposal
solidifies.
Note to those from the future: this proposal draft link will
eventually be merged to the torspec repo. Check for the final version
here:
https://gitlab.torproject.org/tpo/core/torspec/-/tree/master/proposals
> Indeed, once time is included as a feature, deep learning based Website
> Traffic Fingerprinting attacks will effectively be correlating the
> timing and traffic patterns of websites to their representations in its
> neural model. This model comparison is extremely similar to how
> end-to-end correlation compares the timing and traffic patterns of Tor
> entrance traffic to Tor exit traffic. In fact, deep learning classifiers
> have already shown success in correlating end-to-end traffic on Tor[28].
While you have offered no specific testable predictions for this theory,
presumably to score more crackpot points, allow me to provide a
reduction proof sketch, as well as an easily testable result.
To see that Deep Fingerprinting reduces to Deep Correlation, consider
the construction where the correlator function from DeepCorr is used to
correlate pairs of raw test traces to the raw training traces that were
used to train the Deep Fingerprinting classifier. The correlated pairs
would be constructed from the monitored set's test and training
examples. This means that instead of correlating client traffic to Exit
traffic, DeepCorr is correlating "live" client traces directly to the
raw fingerprinting training model, as you said.
This gets us "closed world" fingerprinting results. For "open world"
results, include the unmonitored set as input that does not contain
matches (to represent partial network observation that results in
unmatched pairs).
If the accuracy from this DeepCorr Fingerprinting construction is better
than Deep Fingerprinting for closed and open world scenarios, one can
conclude that Deep Fingerprinting reduces to DeepCorr, in a
computational complexity and information-theoretic sense. This is
testable.
If the accuracy is worse, then Deep Fingerprinting is actually a more
powerful attack than DeepCorr, and thus defenses against Deep
Fingerprinting should perform even better against DeepCorr, for web
traffic. This is also testable.
This reduction also makes sense intuitively. The most powerful
correlation and fingerprinting attacks now use CNNs under the hood. So
they should both have the same expressive power, and inference
capability.
Interestingly, the dataset that Pulls used was significantly larger than
what DeepCorr used, in terms of "pairs" that must be matched.
More interestingly, DeepCorr also found that truncating flows to the
initial portion was still sufficient for high accuracy. Pulls's
defenses also found that the beginning of website traces were most
important to pad heavily.
> Some say that Long Term Statistical Disclosure (LTSD) attacks will still
> always win the end-to-end correlation game against anonymity networks,
> in the fullness of time[29].
>
> However, LTSD attacks are only a theory. And much like quantum
> mechanics, relativity, and LSD, these attacks also warp one's perception
> of reality, time, and space. All of these theories are fundamentally
> misguided.
>
> LTSD attacks predict that over time, correlation gradually leaks enough
> information to fully deanonymize users of anonymity networks. But also
> much like quantum mechanics, they fail to fully define the mechanism.
>
> Consider this thought experiment (feel free to use whatever mind
> expanding devices you have at hand to assist you): LTSD assumes that an
> adversary has complete high resolution information of all traffic that
> enters and exits an anonymity network. Additionally, LTSD assumes that
> an adversary has identifiers available to properly track traffic streams
> on *each* side of the correlation, over the full duration of observation
> and long-term correlation.
For a more modern treatment of LTSD-like correlation attack theory, see
The Anonymity Trilemma: https://eprint.iacr.org/2017/954.pdf
Even so, all of the limitations you have identified still apply. Some
have been incorporated into the theory and indeed show decrease in
efficacy, but others have still not been accounted for!
As I said in the circpad framework documentation, I prefer an empirical
approach to pure formalism, for this reason. I agree that it looks like
we can do much better than today, for a realistic amount of overhead.
All of that said, anonymity is a complicated problem. As your earlier
posts indicate: targeting, stylometry, and mailinglist post timing can
degrade anonymity in surprising ways. The Raccoon Effect only works if
we have enough raccoons who behave and look alike, and are exceedingly
careful about it. The machines can do much more than correlate traffic
patterns, these days!
> This by itself is a huge win. We can now say with certainty that The
> Raccoon Effect has thoroughly discromulated correlation attacks.
>
> (Discromulation is my term to describe what this kind of defense does.
> Most interestingly, I am forced into winning this crackpot point.
> Because deep learning is an opaque machine generated attack, and because
> the GA-optimized defense is also machine generated, it is actually
> impossible to precisely describe the complete behaviors of either one,
> other than with the resulting model definitions themselves! Brave new
> world.)
This *is* interesting. Pulls also pointed this out in his paper. This is
another reason why it seems better to rely on reproducible empirical
methods, rather than pure formalism.
> Now, what about alien intervention? Well, assuming we do not consider
> the AI that participated in this work to be alien: if aliens did
> intervene, none would argue the discromulating conflugruity of The
> Raccoon Effect. Unfortunately however, I can neither confirm nor deny
> these allegations[34], at this time[35].
The fact that Pulls's AI named itself 'Interspace' has me curious and
eager to subscribe to your newsletter!
> But that's not all! Since the circuit padding framework is implemented
> in Tor, this means that it is covered by Tor's bug bounty. While
> research papers that break padding defenses are not covered by the
> bounty (especially if those defenses are not actually deployed), there
> *is* in fact prize money for any flaws found in the framework that could
> lead to code execution, or deanonymization[36].
Unfortunately, when OTF lost funding due to the Trump administration's
desire to fund closed source Internet Freedom tools, we also lost our
OTF funding for this bug bounty, and had to temporarily suspend it while
we look for a new sponsor.
However, to keep you honest (and preserve your crackpot points), I will
personally honor the bounty for any bugs found in the circpad framework,
as deployed in Tor, that lead to code execution or full deanonymization,
as a result of that code (excluding correlation and fingerprinting
attacks, until we deploy strong defenses). It is mostly my code anyway,
and I doubt George Kadianakis made any mistakes.
If anyone wants to help support Tor's ability to make progress on these
types of problems, please consider donating:
https://donate.torproject.org/
> P.P.P.S. At 1004 points on the crackpot index, I believe this post is
> now the highest scoring publication with a valid novel idea that has
> been written, to date[2].
If it helps to get a raccoon into the world record books: I again
confirm this is a valid, novel idea. I have kept John Baez on Cc for
this reason. We should probably take him off after this :).
> P.P.P.P.S. Fucking bored as fuck during this fucking pandemic. Fuck![42]
I hear you. To help pass the time until the aliens reveal themselves,
I've made a playlist:
https://open.spotify.com/playlist/5iYQ0BZNEOaoRhf8Pydvqp
> 1. https://math.ucr.edu/home/baez/crackpot.html
> 2. https://www.reddit.com/r/math/comments/4r05wh/has_anyone_with_a_high_crackpot_index_score_every/
> 3. https://en.m.wikipedia.org/wiki/Betteridge%27s_law_of_headlines
> 4. http://www.stinkymeat.net/
> 5. https://archives.seul.org/or/dev/Mar-2012/msg00019.html - Raccoon23 Post1
> 6. https://archives.seul.org/or/dev/Sep-2008/msg00016.html - Raccoon23 Post2
> 7. https://conspicuouschatter.wordpress.com/2008/09/30/the-base-rate-fallacy-and-the-traffic-analysis-of-tor/
> 8. https://fahrplan.events.ccc.de/congress/2006/Fahrplan/speakers/1242.en.html
> 9. https://awards.acm.org/award_winners/syverson_5067587
> 10. https://web.archive.org/web/20121130072122/http://www.foreignpolicy.com/articles/2012/11/26/the_fp_100_global_thinkers?page=0,48
> 11. https://lists.torproject.org/pipermail/tor-dev/2008-September/002493.html
> 12. https://en.wikipedia.org/wiki/Simulation_hypothesis#The_simulation_argument
> 13. https://en.wikipedia.org/wiki/Alcubierre_drive
> 14. https://www.bbc.com/news/world-europe-36173247 - Weasel takes down LHC
> 15. https://www.slashgear.com/elon-musk-has-banned-hot-tub-talks-about-simulated-existence-03442784/
> 16. https://www.forbes.com/sites/janetwburns/2016/10/13/elon-musk-and-friends-are-spending-millions-to-break-out-of-the-matrix/
> 17. https://www.youtube.com/watch?v=qLcma0YyzhY - Elon Musk Flame Thrower
> 18. https://www.yogonet.com/international/noticias/2020/12/07/55695-boring-company-approved-to-expand-its-tunnel-to-encore-at-wynn-las-vegas
> 19. https://www.inverse.com/innovation/tesla-electric-jet-3-4-years-away
> 20. https://blog.torproject.org/critique-website-traffic-fingerprinting-attacks
> 21. https://github.com/torproject/tor/blob/master/doc/HACKING/CircuitPaddingDevelopment.md
> 22. https://arxiv.org/pdf/1801.02265.pdf - Deep Fingerprinting Tor
> 23. https://www.youtube.com/watch?v=TvjMr6DU7C8 - Raccoon call
> 24. https://www.comsys.rwth-aachen.de/fileadmin/papers/2020/2020-delacadena-trafficsliver.pdf
> 25. https://arxiv.org/abs/2011.13471 - Pulls GA Defense
> 26. https://abcnews.go.com/blogs/headlines/2014/05/ex-nsa-chief-we-kill-people-based-on-metadata
> 27. https://www.full-thesis.net/fragments-of-energy-not-waves-or-particles-may-be-the-fundamental-building-blocks-of-the-universe/4418/
> 28. https://people.cs.umass.edu/~amir/papers/CCS18-DeepCorr.pdf
> 29. https://www.freehaven.net/anonbib/cache/statistical-disclosure.pdf
> 30. https://transparencyreport.google.com/https/overview?hl=en
> 31. https://en.wikipedia.org/wiki/Accelerando#Characters
> 32. https://fahrplan.events.ccc.de/congress/2006/Fahrplan/attachments/1167-SpeakingAnonymously.pdf
> 33. https://www.youtube.com/watch?v=jSRfIMjvtFk - Raccoons and cats <3
> 34. https://edition.cnn.com/2020/04/27/politics/pentagon-ufo-videos/index.html
> 35. https://www.nbcnews.com/news/weird-news/former-israeli-space-security-chief-says-extraterrestrials-exist-trump-knows-n1250333
> 36. https://hackerone.com/torproject
> 37. https://www.msn.com/en-ie/news/coronavirus/during-a-pandemic-isaac-newton-had-to-work-from-home-too-he-used-the-time-wisely/ar-BB118Jyp
> 38. https://www.youtube.com/watch?v=Ofp26_oc4CA - Raccoons are Legion
> 39. https://www.usenix.org/conference/usenixsecurity21/artifact-evaluation-information
> 40. https://petsymposium.org/artifacts.php
> 41. https://en.wikipedia.org/wiki/Liar_paradox
> 42. https://www.youtube.com/watch?v=04_rIuVc_qM - WTF
This is an auspicious number of top-tier references!
> For Karsten:
> https://cs5.livemaster.ru/storage/3a/1f/1449eb23f3c3b318ab4960815fn4--watercolour-watercolor-sad-raccoon.jpg
It is comforting to know that Karsten had friends even among the
raccoons. Probably among the aliens too.
--
Mike Perry
More information about the tor-dev
mailing list