Proposal: Token Bucket
Nick Mathewson
nickm at freehaven.net
Mon Jan 24 18:40:00 UTC 2011
2011/1/19 Björn Scheuermann <scheuermann at cs.uni-duesseldorf.de>:
> Dear Nick,
>
> thanks a lot for the feedback.
>
> On Mi, 2011-01-19 at 14:24 -0500, Nick Mathewson wrote:
>> 2011/1/3 Florian Tschorsch <tschorsch at cs.uni-duesseldorf.de>:
>> [...]
>> > First, we observe that the token bucket for the relayed traffic on the
>> > outgoing connections is unnecessary: since no new such traffic is generated
>> > in an onion router, the rate of this traffic is already limited by the read
>> > bucket on the incoming side (cp. RelayedTokenBucket).
>>
>> This is not strictly true, I think. Outbound cells with no
>> corresponding incoming can be generated by local directory requests,
>> by local bandwidth testing, and probably a couple of other things too.
>
> In the strict sense, you are of course right. However, the amount of
> traffic caused by these functions is likely to be negligibly small (if I
> remember that correctly, someone [Karsten Loesing?] investigated the
> amount of directory traffic in some other context, and found that it
> *is* negligible).
>
> If incoming and outgoing bandwidth limits are equal, an increase in the
> volume of traffic within the onion router, in fact, even constitutes an
> even stronger argument for our proposed modifications: if you have a
> system (here: the queues in the onion router) where you continuously
> admit more data into the system (incoming bandwidth limit + additional
> overhead and traffic) than you allow to leave the system (outgoing
> bandwidth limit), you will necessarily build up longer and longer queues
> within the system, which will in turn cause very long cell delays -
> which is just what we currently observe in Tor. Therefore, limiting the
> amount of traffic on both ends is simply not a good idea.
Ah, I think I wasn't clear about the scope of my objection. I'm not
arguing against the idea of doing something useful to ameliorate the
queueing issue: I am arguing against the original proposal, which was
(if I understood correctly) to eliminate outbound rate limiting
entirely. The other proposals you suggest below (limiting read to a
fraction of write, either adaptively or at a fixed degree) are much
IMO better.
The reason that I can't agree with the original proposal ("remove the
rate limiting mechanism on the outgoing side") is that it will make
Tor nodes potentially write much, much more than they have been
configured to write. Operators often configure rate limits for good
reasons: to conserve limited bandwidth, to save money if they're
paying by the byte, or to avoid bothering their local network
administrators. When Tor violates its configured rate limits,
operators get angry. Since we depend on volunteers to operate the Tor
network, we need to keep them happy, and so using more of their
bandwidth than they've agreed to provide us is not a good idea.
> I really believe that this is a major issue which significantly (and
> unnecessarily) limits the performance. Florian and I will be happy to
> contribute a patch if everyone agrees.
>
>> Also, when traffic arrives slowly on an edge connection, packaging it
>> into cells adds significant overhead (since every outgoing cell is 512
>> bytes+TLS overhead but not every outgoing cell is full).
>
> One may of course argue that the (admittely in extreme cases
> significant) packaging overhead should be accounted for when Tor
> enforces the configured bandwidth limits. However, to be consistent, one
> would then also have to account for TLS, TCP, and IP header overhead,
> which is currently neglected. This overhead will also be very
> significant if traffic arrives *that* slowly, i.e., if you have TCP
> segments far smaller than a cell size.
>
> This will certainly be interesting to assess in more detail. In case it
> actually turns out to be an issue, I see two options:
>
> i) the pragmatic solution: based on a measurement study how much the
> traffic typically "grows" due to packaging overhead etc., reduce the
> configured bandwidth limit by an appropriate percentage
>
> ii) an "observe+adjust" mechanism: observe how much traffic is actually
> leaving the router; if the outgoing traffic exceeds the configured
> limit, reduce the rate at which data is read on the incoming side
> accordingly
>
> The latter, if done right, would be able to guarantee that configured
> bandwidth limits are strictly enforced, despite additional traffic being
> generated in the router, and in even in case of an arbitrarily high
> packaging overhead. However, my feeling is that this is a too complex
> solution for a very simple problem.
I think we do need to go with something closer to the latter more
sophisticated approach for a couple of reasons.
First, as noted above, we really do need to enforce the configured
bandwidth limits.
Second, we must consider attackers. It's not enough to say that based
on empirical evidence, directory responses currently occupy a small
fraction of data written, or that with current traffic loads each cell
is on average (say) 70% full. If an attacker can make lots of
directory requests, or trickle data one byte at a time on a large
number of streams, he can cause the outbound data to exceed the
inbound data by much more than expected.
So I think the only approach that's going to work here is to hold the
outbound rate fixed while adaptively adjusting the input rate
downwards until we stop generating outbound traffic faster than we can
send it.
>> > We therefore propose
>> > to remove the rate limiting mechanism on the outgoing side. This will
>> > eliminate the "double door effect" discussed above, since all cells are
>> > allowed to flow freely out of the router once they passed the incoming rate
>> > limiter.
>> >
>> > Second, the refill interval of the buckets should be shortened. The
>> > remaining token buckets should be refilled more often, with a
>> > correspondingly smaller amount of tokens. For instance, the buckets might
>> > be refilled every 10 milliseconds with one-hundredth of the amount of data
>> > admissible per second.
>>
>> Smaller bucket refill intervals are already implemented in Tor master
>> if you build with Libevent 2.0 and use the buffervents backend by
>> passing --enable-bufferevents to configure. The refill interval is
>> currently set to 1/3 of a second, but that value was chosen more or
>> less at random; it would be neat to see other values benchmarked as
>> well.
>
> We did some measurements on our onion router, and also many simulations.
> 1/3 second still seems far too coarse. One would have to come
> significantly below the RTT of the TCP links between onion routers. We
> therefore tend more towards something in the order of 1/100 s. (We did
> not notice any difference in CPU utilization when we made this change -
> so this apparently isn't an issue.)
Exciting! I'd be interested to see whether this also holds with the
Libevent 2 ratelimiting code as run on a live network, or if there is
some inefficiency there that we need to address.
--
Nick
More information about the tor-dev
mailing list