Refactored or_connection_t buffers into cell queues.

Tue Mar 27 22:43:20 UTC 2007

Hi, all.  Roger asked me to brain-dump here to explain the refactoring
I just did on how Tor handles buffering on OR connections.  (This went
in as svn revisions 9904 through 9907.)

Previously, we did all of our buffering on the OR connection
structures: We'd pull a cell from on or_connection_t (or package one
from the edge), {en|de}crypt it, use the relevant circuit to figure
out what or_connection_t it should go onto, and append it to that
or_connection_t's outbuf.

The patches I just checked in adds another step: after cells are
encrypted and the relevant circuit are found, the cells are queued in
a linked list at the circuit_t.  When the or_connection_t's outbuf
falls below a certain size, we pull cells from the appropriate cell
queues until the outbuf is more full again.

There are a few more details here that I'll explain below.  First,
I'll talk about why it's a good idea:

 - The ring buffer implementation in buffer.c amortizes resizing costs
   for buffers when doubling them when they're full and halving them
   when they're 3/4-empty.  Thus, many buffers spend most of their
   time largely empty: this means that we wind up malloc'ing a lot
   more memory than we need.

 - There are a lot of features we want to add that require us to
   assign different priority to different kinds of traffic.  (For
   example, some relay-incentive schemes need us to give priority to
   traffic from other ORs.)

 - We need to kill connections when their buffers use up too much
   RAM.  But when that happens, we kill every circuit on the
   connection.  It might be better to kill only the circuits that have
   a lot of data queued.

 - Tracking circuit queues separately from one another lets us fill
   circuits in a fair fashion:  We can start reading from exit
   connections only at a rate that the circuits are able to flush
   themselves, rather than at the maximum rate possible.

Implementation issues:

1) I've also done a patch to stop reading on edge connections when the
   circuit gets too full, and start reading on the edge connections
   again when the circuit gets emptier.

   [Maybe we should aim to read from edge connections at constant
   rates rather than stopping and starting like this.  That will be
   harder, but will probably give better TCP performance.]

2) The way the code is currently implemented, we check whether to add
   more cells to an outbuf whenever we flush from that outbuf.  This
   doesn't work when adding the first few cells to circuit queues,
   since there's nothing to flush.  Therefore, whenever we add a cell
   to a cell queue and the or_conn's outbuf is empty, we write the
   cell straight to the outbuf.

   [I think this rule should maybe become "add cells straight to the
   outbuf whenever it is small", but I don't know whether that's really
   an improvement.]

3) To avoid having to walk all the circuit on an or_connection (there
   can be quite a lot of them), we keep the _active_ circuits (those
   with pending cells) in a doubly-linked-ring on each or_connection.
   An or_circuit is potentially connected to two or_connections, so
   each or_circuit is up to two of these rings, and so has two next
   and prev pointers.  This makes the link and unlink logic a little
   trickier than it needs to be, but I can't immediately see how to
   simplify it.  Have a look and drop me a line if you have a good
   idea.

4) The stop-adding-cells and start-adding-cells thresholds for moving
   cells from cell queues to or_connections are currently 32K and 16K.
   The stop-adding-cells and start-adding-cells thresholds for
   packaging cells from the edge are currently 256 cells and 64 cells.

   [I have no good justification for any of these numbers besides the
   16K one; for efficient bandwidth usage, we want as many full TLS
   frames going out as possible.]

5) The code currently does all the crypto before adding cells to
   queues, but does the cell-to-bytes encoding when adding cells to
   outbufs.  We could maybe get a little more ram-efficient by
   packaging in advance too.

6) Cells are about to become the most heavily allocated and freed
   structures in Tor.  This is probably going to incur some malloc
   overhead, especially on platforms where malloc() sucks.  Since
   cells are fixed-size, this arguably calls for arena allocation.

7) I stuck most of the cell queue logic in relay.c.  Roger: let me
   know if it belongs somewhere else instead.

8) There is no number eight.

peace,
-- 
Nick Mathewson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 652 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20070327/4676652f/attachment.pgp>