[tor-dev] Request for comment: scheduler heuristics

Sat Nov 9 04:42:19 UTC 2013

As some of you may be aware, I've been working on new scheduler code for Tor
recently.  Currently, Tor calls channel_flush_some_cells() whenever a channel
becomes able to accept more writes, or channel_write_*_cell() whenever a new
cell arrives on a channel able to accept cells without other queued cells on
circuits.  Thus, we can only see one channel at a time when making scheduling
decisions, and the existing circuitmux code is only concerned with providing
an API for implementing prioritization of circuits attached to a particular
channel.

The new scheduler aims to remove this limitation.  To do this, we have to get
a global view of all possible choices we could make about which circuit to
get a cell from next, so instead we keep lists of channels which can currently
accept writes and which currently have cells available to send, and trigger
the scheduler using event_active() whenever a channel meets both conditions.
Then the scheduler runs from a libevent callback and loops over the list of
channels in the intersection of both sets.  At present it just invokes the
existing circuitmux code, so the behavior differs little from the old code,
but this provides a framework where we can potentially see multiple channels
simultaneously to make scheduling decisions.

Notice, though, that as described it will rarely actually see more than
one pending channel - just looping over them all every time is too greedy.
We want to put off making those choices long enough to see all our options,
without delaying so long the channel send queues downstream of the scheduler
loop start emptying and the delayed choice actually causes increased latency.

At first, NickM and I had discussed this and he suggested a global high/low-
water mark system parallel to the ones for each connection; I added code to
the channel layer to allow lower layer implementations to provide queue size
and overhead estimates in a generic way, so now the channel code has an
up to date estimate of the total number of bytes queued to send on all channels
in both lower-layer queues and channel_t queues.

I believe this design should not be used, because making the flow rate on one
channel affect the scheduling of others like that is a potential DoS.  If
an attacker controls one node, he can build a circuit through a target node
to that node, and then send traffic down it, but force the node to drain it
only very slowly, keeping the target's queue size always above a minimum
level.  If the low-water mark for the global queue is set higher than the
high-water mark for each connection it can prevent this, but remains vulnerable
to a similar attack using multiple channels to different nodes.

The simplest proposed fixes involve modifying the queue size used to control
the scheduler to age out old contributions from non-draining or slow-draining
channels - replace the actual queue size with an effective queue size which
drains linearly at a some fraction of the node's expect bandwidth, or drops
off exponentially.  These are straightforward to implement, but I'm unsure
what the best option is, so comment is invited.

-- 
Andrea Shepard
<andrea at torproject.org>
PGP fingerprint (ECC): BDF5 F867 8A52 4E4A BECF  DE79 A4FF BC34 F01D D536
PGP fingerprint (RSA): 3611 95A4 0740 ED1B 7EA5  DF7E 4191 13D9 D0CF BDA5
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 328 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20131108/e89e7360/attachment.sig>