Optimizing tor bandwidth

Sun Mar 16 00:00:18 UTC 2003

I've been mulling over some ways to make our bandwidth use more efficient
(either for the exit node, the tor network itself, or the user):

a) Move cell size from 128 to 256. The theory here is that most cells
are already packed because a given cell is far more likely to be part
of a bulk transfer, so if we lose 8 bytes to overhead either way, why
not more-than-double the payload.

b) Remove the unused 4 bytes in the cell header. Let's not do this,
because it probably doesn't matter much. Especially if we do a).

(Aside: Can we put length and payload next to each other, so you do a
single crypt call to cover both of them? Does that matter? Is 121 bytes
a really stupid number to do a crypt on, or does it matter?)

c) Splice multiple cells into one cell, if their payloads are small.
This way we can queue up ssh connection cells and put them into a single
data cell. But I think this is more trouble than it's worth, a) because
we have to figure out how to queue things but not hold on to them for
too long, b) because we have to figure out how to splice them together,
and c) because a given cell is probably already full anyway.

d) Let's put one side through zlib compression, and the other side
through zlib decompression. That way we can handle more traffic over
the tor backbones, and also we deliver more over a given user's pipe.

(Tor was already *faster* than going straight to the web sites, for some
people, because all the nodes were on moria and person->MIT->website was
a better route than person->website. Though clearly this will degrade
quickly as we spread out the nodes. But allowing compressed downloads
may be a long-term win.)

e) Rather than having a separate sendme cell with a useless payload,
should we have sendme cells *be* data cells, just with a different
command? That way we can double up when they're available. Similarly
we can have sendme topic commands also carry data. The first caveat is
that we can't afford to delay sendmes, and figuring out how to queue
things efficiently is hard. The second caveat is that we need to either
count sendmes as data now for purposes of flow control, in which case
we can get deadlocks, or not count them, in which case a bad guy can
still get his connections through even when the window is empty (tho we
could solve that by not allowing a sendme if your window is already at
max...). Overall probably not worth it.

f) Run squid on the exit node, to help prevent traffic analysis, and to
help reduce bandwidth load on that node.

What do you think about the above? What else could we do?
--Roger