Stream starvation in Tor (with patch)
Ian Goldberg
iang at cs.uwaterloo.ca
Mon Nov 22 13:33:15 UTC 2010
or-dev folks,
A number of groups (the UW people, the UCSD people, Roger(?) and
possibly others) have noticed that when many active streams (more than
about 4) are opened on a single circuit, 3 or 4 of them get service and
the rest starve until those are finished. Note that, for example, most
web browsers will open multiple streams at once to fetch parts of web
pages.
Here's a plot of 9 active streams on one circuit:
https://thunk.cs.uwaterloo.ca/mashael/broken.png
(This graph was from measurements on a private Tor network on
PlanetLab. This was from 0.2.3.0-alpha-dev, but the same behaviour was
observed previously on 0.2.2.*.)
Here's Mashael's writeup about the problem.
Our network consists of five nodes. An authoritative directory, a client
(also running as Onion Proxy), 3 Tor Onion Routers and a server. We
refer to the three onion routers as Entry, Middle and Exit. Our client
constructs a circuit through the network, where the first hop is Entry,
the second is Middle and the third is Exit. The constructed circuit is
used to create multiple streams to the server. Each stream basically
carries a request from the client to the server, which replies with
10,000 cells. We repeated this experiment for 2, 5, 9 and 13 streams.
The common result is that the last three client streams created "almost"
starved the other earlier streams until they fully completed
transferring their 10,000 cells.
This problem only occurs when we have a bottleneck OR in the path. We
continue to see the problem as long as the bandwidth offered by the
bottleneck router is less than approximately 80% of the bandwidth that
is offered by the other routers in the circuit. For example, if the
bandwidthrate of Entry and Exit is 630 KB/s and 720 KB/s, respectively,
the problem is still visible even when the bandwidthrate of Middle is up
to 550 KB/s. We verified that by setting the bandwidthrate command for
the Middle to a minimum value (20 KB/s) and then gradually increasing it
until the starvation of streams disappears from the resulting graphs at
approximately 600 KB/s.
The reason the "streams problem" occurs is due to the complicated
interaction between Tor's congestion control and libevent. At some point
during the experiment, the circuit window is exhausted, which blocks all
edge streams. When a circuit level sendme is received at Exit, it
resumes edge reading by looping over linked list of edge streams, and
calling connection_start_reading() to inform libevent to resume reading.
When the streams are activated again, Tor gets the chance to service the
first three streams activated before the circuit window is exhausted
again, which causes all streams to be blocked again. As an experiment,
we reversed the order in which the streams are activated, and indeed the
first three streams, rather than the last three, got service, while the
others starved.
Our solution is to change the order in which streams are activated. We
choose a random edge connection from the linked list, and then we
activate streams starting from that chosen stream. When we reach the end
of the list, then we continue from the head of the list until our chosen
stream (treating the linked list as a circular linked list). It would
probably be better to actually remember which streams have received
service recently, but this way is simple and effective.
Here's the graph again, with the patched Exit:
https://thunk.cs.uwaterloo.ca/mashael/fixed.png
The patch is attached.
- Ian and Mashael
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mashael_stream_fix.diff
Type: text/x-diff
Size: 2419 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20101122/8fb25c93/attachment.diff>
More information about the tor-dev
mailing list