SPD talk: "Simulating a Global Passive Adversary for Attacking Tor-like Anonymity Systems"?

Thu Jun 12 23:26:48 UTC 2008

Thus spake gojosan at mailhaven.com (gojosan at mailhaven.com):

> I just noticed this talk at the Security and Privacy Day from May 2008. 
> While I understand that Tor's thread model does not defend against a GPA
> I am still curious what effect this attack can have against the current,
> real Tor network?  
> 
> Simulating a Global Passive Adversary for Attacking Tor-like Anonymity
> Systems
> http://web.crypto.cs.sunysb.edu/spday/

A handful of comments about the paper (many of these they themselves
brought up, but some they did not):

0. They are really an active adversary here. They need to be
controlling a website in order to unmask users, or have control of
some exit nodes or their upstream routers, and must modulate the
bandwidth of TCP connections considerably (+/- 30-40KB/sec or more,
for a period of 10 minutes).

1. Their results for path recognition (25% false negatives, 10% false
positives) are based on their limited sample set of only 13 trial
circuits via a small set of nodes that must be geographically close to
and well-peered with their pinging 'vantage point(s)'. I suspect
reality is a lot less forgiving than this limited sample size
indicates when more nodes and trials are involved using the real Tor
path selection algorithm.

2. The bandwidth estimation technique they utilized (based on TCP
Westwood/CapProbe/PacketPair) is very sensitive to any queuing and
congestion that occurs between the target and the 'vantage point(s)'.
As soon as congestion happens along the path, these types of estimates
report a large amount of excess capacity (rather than no capacity) due
to the acks/responses getting compressed together in queues. The way
this has been 'fixed' in TCP Westwood+ is to filter out the high
estimates and perform weighted averaging to smooth fluctuations
(precisely what they are trying to measure). It would have been nice
if they provided some more realistic testing of their bandwidth
estimation consistency using real world nodes as opposed to the lab
results on half-duplex ethernet.

3. Based on my measurements last year, only the top ~5-10% nodes are
capable of transmitting this much data in an individual stream, and
only if all of the nodes in your path are from this set. Furthermore,
as load balancing improves (and we still have more work to do here
beyond my initial improvements last year), these averages should in
theory come down for these nodes (but increase for slower nodes). So
how they will fair once we figure out the bottlenecks of the network
is unknown. They could do better in this case, but it is probably more
likely the average stream capacity for most nodes will drop below
their detection threshold.

4. Right now these few fast nodes carry about 60% of the network
traffic. A rough back of the envelope calculation based on our
selection algorithm means that only ~22% (.6*.6*.6) of the paths of
the network have this property for normal traffic, and only ~4.5% of
hidden service paths (which are 6 hops).

5. Their error rates do get pretty high once they've begun
trying to trace the stream back to its ISP (on top of the rates for
just path recognition). Any other fluctuations in traffic are going to
add error to this ability, and I imagine traffic fluctuates like crazy
along these paths. They also assume full a-priori knowledge of these
routes which in practice means a full map of all of the peering
agreements of the Internet, and 'vantage point(s)' that have no
queuing delay to all of them..

A couple countermeasures that are possible:

1. Nodes that block ICMP and filter closed TCP ports are less
susceptible to this attack, since they would force the adversary to
measure the capacity changes at upstream routers instead (which will
have other noise introduced due to peers utilizing the link as well). I
am wondering if this means we should scan the network to see how many of
these top nodes allow ICMP and send TCP resets, and if it is feasible to
notify their operators that they may want to consider improving their
firewalls, since we're only talking about 100-150 IPs here. There are a
lot more critical things to scan for though, so this is probably lower
priority.

2. Roger pointed out that clients can potentially protect themselves
by setting 'BandwidthRate 25KB' and setting 'BandwidthBurst' to some
high value, so that short lived streams will still get high capacity
if it is available, but once streams approach the 10-20minute lifetime
needed for this attack to work, they should be below the detectable
threshold. I think this is a somewhat ugly hack, and should probably
be governed by a "High Security Mode" setting that would be
specifically tuned to this purpose (and be a catching point for other
hacks that protect against various attacks but at the expense of
performance/usability).

All this aside, this is a very clever attack, and further evidence
that we should more closely study capacity properties, reliability
properties, queuing properties, and general balancing properties of
the network.

-- 
Mike Perry
Mad Computer Scientist
fscked.org evil labs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20080612/c98b7fe3/attachment.pgp>