How I Learned to Stop Ph34ring NSA and Love the Base Rate Fallacy

The23rd Raccoon the.raccoon23 at gmail.com
Sun Sep 28 12:27:11 UTC 2008


Abstract

Ladies and gentlemen, I am a Raccoon who has learned to do math.
It used to be the case that on the Internet, nobody knew you were a
Raccoon. But now that Privacy is dead, I've decided to come clean.

I present to you this anonymously authored, non-peer reviewed
communication to do with what you will. Should anyone actually cite
this work in a published paper, I will ask my brethren to leave their
garbage cans unmolested for the rest of their days.


Introduction

This post performs some basic analysis of the utility of timing
correlation attacks against a moderately used anonymous network,
specifically with respect to the Base Rate Fallacy[1] of Bayesian
statistics. Via that same analysis, it also for the first time begins to
quantify the utility that additional users bring to a low latency
anonymous network in terms of resistance to timing attacks.

You see, one day I was rifling through the local University dumpster
looking for a free meal, and I stumbled upon a pile of discarded
conference proceedings which I decided to peruse while I dined. After
a while, it became apparent that many papers dealing with timing and
correlation attacks completely ignore the Base Rate Fallacy and the
effect of larger user bases and sample sizes on their results.

Worse, while it may be possible that some papers actually DO report
Bayesian Detection Rate probabilities, those that do never specify
this fact, making their work impossible to differentiate from less
appetizing dumpster morsels.

It was enough to get this Raccoon down, and I survive contently on
moldy bread and discarded hot dogs[2]!


Event Notation and Grounding

Most timing attack papers deal with the true positive rate and the
false positive rate of detection. So let's establish some symbols
and formalize these two rates.

M = Two chosen entry and exit streams Match (are the same stream)
~M = Two chosen entry and exit streams Don't Match (are not the same)
C = Packet/stream timing Correlation predicts a pair is matching
~C = Packet/stream timing Correlation predicts a pair is Non-matching


True Positive Rate: P(C|M)
False Negative Rate: P(~C|M) = 1-P(C|M)

True Negative Rate: P(~C|~M) = 1-P(C|~M)
False Positive Rate: P(C|~M)


Bayesian Detection Rate Derivation

Given a purely hypothetical network with 250,000 concurrent users
generating approximately 5000 network-wide concurrent streams and a
99.9% accurate (Equal Error Rate) generic correlation detector, find
the probability that we actually DO have a matching entry+exit pair
given our Correlator says that we do. This is known as the Bayesian
Detection Rate, and is written as P(M|C).



More information about the tor-dev mailing list