Proposal: Token Bucket

Florian Tschorsch tschorsch at cs.uni-duesseldorf.de
Thu Feb 10 18:46:15 UTC 2011


Hi Nick,

sorry for the long round-trip time (even exceeding those in the Tor
network ;) ).

On Mo, 2011-01-24 at 13:40 -0500, Nick Mathewson wrote: 
>> If incoming and outgoing bandwidth limits are equal, an increase in the
>> volume of traffic within the onion router, in fact, even constitutes an
>> even stronger argument for our proposed modifications: if you have a
>> system (here: the queues in the onion router) where you continuously
>> admit more data into the system (incoming bandwidth limit + additional
>> overhead and traffic) than you allow to leave the system (outgoing
>> bandwidth limit), you will necessarily build up longer and longer queues
>> within the system, which will in turn cause very long cell delays -
>> which is just what we currently observe in Tor. Therefore, limiting the
>> amount of traffic on both ends is simply not a good idea.
> 
> Ah, I think I wasn't clear about the scope of my objection.  I'm not
> arguing against the idea of doing something useful to ameliorate the
> queueing issue: I am arguing against the original proposal, which was
> (if I understood correctly) to eliminate outbound rate limiting
> entirely.  The other proposals you suggest below (limiting read to a
> fraction of write, either adaptively or at a fixed degree) are much
> IMO better.

we were indeed proposing to eliminate the outgoing rate limitation
completely, and instead to establish other means to keep Tor's bandwidth
within the desired limits - by limiting the amount of data going into a
relay and/or being "produced" in a relay. This will avoid data "piling
up" in the system. I admit that there may exist scenarios where a too
simple solution could cause trouble, though - I don't have much to say
against your attacker driving a relay to exceed its configured bandwidth
limits.

I would nevertheless still very much prefer a solution that can get rid
of outgoing rate limitation entirely, if (and only if - you're perfectly
right about this) we manage to build the rest of the system in a way
that it will anyway stay within the configured limits. Simply because
such a solution is much easier to understand and analyze, and will thus
be much less likely to cause any unexpected trouble in the future (like
the side effects that we observe with the current design).

> The reason that I can't agree with the original proposal ("remove the
> rate limiting mechanism on the outgoing side") is that it will make
> Tor nodes potentially write much, much more than they have been
> configured to write.  Operators often configure rate limits for good
> reasons: to conserve limited bandwidth, to save money if they're
> paying by the byte, or to avoid bothering their local network
> administrators.  When Tor violates its configured rate limits,
> operators get angry.  Since we depend on volunteers to operate the Tor
> network, we need to keep them happy, and so using more of their
> bandwidth than they've agreed to provide us is not a good idea.

I fully agree with your point, and that aspect is certainly to be kept
in mind. So, the question is: how can we ensure that the configured rate
limits are not exceeded with the lowest possible system complexity, and
in conjunction with a queueing design that is stable and has good
performance.

>> One may of course argue that the (admittely in extreme cases
>> significant) packaging overhead should be accounted for when Tor
>> enforces the configured bandwidth limits. However, to be consistent, one
>> would then also have to account for TLS, TCP, and IP header overhead,
>> which is currently neglected. This overhead will also be very
>> significant if traffic arrives *that* slowly, i.e., if you have TCP
>> segments far smaller than a cell size.
>> 
>> This will certainly be interesting to assess in more detail. In case it
>> actually turns out to be an issue, I see two options:
>> 
>> i) the pragmatic solution: based on a measurement study how much the
>> traffic typically "grows" due to packaging overhead etc., reduce the
>> configured bandwidth limit by an appropriate percentage
>> 
>> ii) an "observe+adjust" mechanism: observe how much traffic is actually
>> leaving the router; if the outgoing traffic exceeds the configured
>> limit, reduce the rate at which data is read on the incoming side
>> accordingly
>> 
>> The latter, if done right, would be able to guarantee that configured
>> bandwidth limits are strictly enforced, despite additional traffic being
>> generated in the router, and in even in case of an arbitrarily high
>> packaging overhead. However, my feeling is that this is a too complex
>> solution for a very simple problem.
> 
> I think we do need to go with something closer to the latter more
> sophisticated approach for a couple of reasons.
> 
> First, as noted above, we really do need to enforce the configured
> bandwidth limits.
> 
> Second, we must consider attackers.  It's not enough to say that based
> on empirical evidence, directory responses currently occupy a small
> fraction of data written, or that with current traffic loads each cell
> is on average (say) 70% full.  If an attacker can make lots of
> directory requests, or trickle data one byte at a time on a large
> number of streams, he can cause the outbound data to exceed the
> inbound data by much more than expected.

Well - I could answer that pragmatically: you cannot enforce these rate
limits anyway (at least not from within the Tor software). An attacker
can easily generate arbitrary amounts of *incoming* data - which will
likely make the network administrators equally angry. :) But anyway,
yes, I do see your point, and we should take such attacks into account.

> So I think the only approach that's going to work here is to hold the
> outbound rate fixed while adaptively adjusting the input rate
> downwards until we stop generating outbound traffic faster than we can
> send it.

In our discussions here Florian Tschorsch had an IMHO very nice idea: we
continue to use a token bucket on the incoming side to limit the
incoming rate anyway. It will also be easy to monitor (not limit!) the
outgoing rate. We might make use of these two mechanisms to proceed as
follows: if we notice that the outgoing rate exceeds the configured
limit, then we might remove a corresponding amount of tokens from the
bucket on the *incoming* side, thereby slowing down the rate at which
data (including relay traffic, directory requests,...) flows into the
router. (We might allow the incoming bucket to take on negative fill
levels as we do so, so that temporarily exceeding the configured rate
will be compensated by entirely stopping the incoming data flow until
the excess has been compensated by "future" tokens.)

I like that solution because it appears effective - it is easy to see
that it will strictly limit the outgoing traffic to the configured limit
- while at the same time it does not need any rate limiting mechanism on
the outgoing side (it therefore avoids any double-door effects). It is
also reasonably simple and will thus (hopefully) not cause undesired
side effects.

It is still an additional closed control loop within the Tor router
(which will hopefully not cause stability problems in the future), but
in any case it appears much simpler than the current solution. It might
be worth to think about this for some more time, but so far Florian's
idea appears quite convincing to me.

>> We did some measurements on our onion router, and also many simulations.
>> 1/3 second still seems far too coarse. One would have to come
>> significantly below the RTT of the TCP links between onion routers. We
>> therefore tend more towards something in the order of 1/100 s. (We did
>> not notice any difference in CPU utilization when we made this change -
>> so this apparently isn't an issue.)
> 
> Exciting!  I'd be interested to see whether this also holds with the
> Libevent 2 ratelimiting code as run on a live network, or if there is
> some inefficiency there that we need to address.

What we (or rather: Florian) did during the past two weeks is to prepare
a more "complete" and "clean" patch for the libevent1 code, in
collaboration with Karsten Loesing and Sebastian Hahn (btw, thanks for
your efforts!). The patch removes the outgoing rate limitations; in the
light of the discussion above, this should be reconsidered, but should
IMHO be fine for testing purposes. The patch also allows for shorter
refill intervals. We would like to test this on two or three more
"real-world" relays, collecting statistics on the cell queueing time
with and without the patch. 

As of my understanding, Karsten and/or Florian will contact you soon
regarding this patch.


Best regards

Björn (and Florian) 



-- 
Florian Tschorsch
Mobile and Decentralized Networks
Heinrich-Heine-University
Universitätsstr. 1, D-40225 Düsseldorf

Building 25.12, Room 02.43
Phone +49 211 81 11635
Fax +49 211 81 11638

tschorsch at cs.uni-duesseldorf.de
http://www.cn.uni-duesseldorf.de



More information about the tor-dev mailing list