No subject


Tue Mar 1 03:41:44 UTC 2011


P(M|C) = P(C|M)*P(M)/P(C)

Summing out M to obtain P(C):
P(C) = P(M)*P(C|M) + P(~M)*P(C|~M)

Combined:
P(M|C) = P(C|M)*P(M)/(P(M)*P(C|M) + P(~M)*P(C|~M))


Example 1: Fully Correlating Global Adversary

This example deals with an adversary trying to correlate EVERYTHING on
a 5000 concurrent stream network at all times. 5000 concurrent streams
means 5000 entry connections and 5000 exit connections (as an
approximation). This gives 5000^2 possible entry-exit pairings
network-wide, thus:

P(M) = (1/5000)^2 = 4E-8

99.9% EER generic correlation detector:
P(C|M) = 0.999
P(C|~M) = 0.001

P(M|C) = 0.999*4E-8/(4E-8*0.999 + (1-4E-8)*0.001)
P(M|C) = .00004 or .004%.

In other words, we expect that for every 25000 times the correlator
predicts a matching pair, only one of those actually is a valid
match. So much for dragnet surveillance.


Example 2: Single Site-targeting Global Adversary

In this example, the adversary is only interested in connections to a
particular site, say wikileaks.org. For this, let's say the adversary
only has one exit stream at a given time to correlate to a given entry
stream, giving:

P(M) = 1/5000 = .0002

P(M|C) = 0.999*.0002/(.0002*0.999 + (1-.0002)*0.001)
P(M|C) = .1666 = 16.66%

So this means that we expect the adversary to have 6 suspect users for
every user that leaks a document to wikileaks.org for a 99.9% accurate
correlator. Unlike the dragnet adversary, the targeted adversary is still
pretty effective, especially since they will almost certainly have additional
a-priori information about the users they suspect to have done the leak.

However, if their correlator drops to just 99% EER, however, P(M|C) drops
to 0.0194 or 1.94%. At 90% EER, P(M|C) is 0.0018 or 0.18%. These
provide with expectations of confusion sets of 52 and 556 users respectively.


Example 3: Single Site-targeting "Local" Adversary

To extend this into the capabilities of a local adversary (as opposed
to global), insert a (c/n)^2 factor into P(M) to account for the likelihood
that the adversary will see both sides of a connection, where c is the
percentage of network bandwidth they control, and n is the total network
capacity. Common accepted reasonable values for c/n are on the order
of 0.1, though this may be much higher for IX-level yet not quite fully
global adversaries[3]. Let's go with 0.3.

P(M) = (1/5000)*(0.3)^2 = .000018

P(M|C) = 0.999*.000018/(.000018*0.999 + (1-.000018)*.001)
P(M|C) = 0.0177 = 1.77%

So this means that an adversary with control of 30% of the network
(such as China, the US, or Germany) can expect to go through 57 other
users before coming across a legitimate match predicted by their
timing detector. At 99% EER, this number goes up to 562, and at 90%
EER, all the way up to 6173.

This seems a bit more infeasible, but may still be doable with enough
information or many repeated observations.


Conclusions

So what does this all mean? Well, first of all, it underscores the
importance of being absolutely clear in timing attack research about
exactly what success probabilities are being reported, so we can
better compare both attacks and defenses. In fact, in the opinion of
this humble Raccoon, a large body of work is somewhat suspect for
lack of clarity, both in this and other respects.

Second, it gives us a small glimmer of hope that maybe all is not lost
against IX, National, or ISP level adversaries. Especially those with
only sampled or connection-level resolution, where fine-grained
details on inter-packet timings is not available (as will likely be
the case with EU data retention).

Finally, it also quantifies that we certainly do benefit from a larger
anonymity network not just in terms of nodes, but also in terms of
total concurrent users doing similar things (like short-lived web
traffic). This quantification strongly indicates that we should avoid
splitting the network into segments if we want to gain any additional
utility from growing it further, aside from simply supporting more
users at some fixed level of anonymity.



[1]. http://www.raid-symposium.org/raid99/PAPERS/Axelsson.pdf
[2]. http://www.stinkymeat.net/
[3]. http://www.cl.cam.ac.uk/~sjm217/papers/pet07ixanalysis.pdf
[4]. "If you only cite a handful of works, either you are doing
something incredibly novel, or you're not nearly as novel as you
thought."
[5]. "Good artists imitate, great artists steal."



More information about the tor-dev mailing list