[tor-dev] Connection, Channel and Scheduler - An Intense Trek

Thu Nov 16 13:56:59 UTC 2017

On 15 Nov (13:49:54), Nick Mathewson wrote:
> On Mon, Oct 30, 2017 at 3:57 PM, David Goulet <dgoulet at ev0ke.net> wrote:

[snip]

> >
> > Through this epic journey, we've discovered some issues as well as design
> > problems. Now the question is what should and can do about it?
> >
> > In a nutshell, there are a couple of questions we should ask our selfves
> > and try to answer so we can move forward:
> >
> > * I believe now that we should seriously discuss the relevance of
> > channels.  Originally, the idea was good that is providing an abstraction
> > layer for the relay to relay handshake and send/process cells related to
> > the protocol.  But, as of now, they are half doing it.
> >
> >   There is an important cost in code and maintanance of something that is
> >   not properly implemented/finished (channel abstraction) and also
> >   something that is unused. An abstraction implemented only for one thing
> >   is not really useful except maybe to offer an example for others? But we
> >   aren't providing a good example right now imo...
> >
> >   That being said, we can spend time fixing the channel subsystem, trying
> >   to turn it in a nicer interface, fixing all the issues I've described
> >   above (and I suspect there might be more) so the cell scheduler can play
> >   nicely with channels. Or, we could rip them off eliminating lots of code
> >   and reducing our technical debt. I would like us to think about what we
> >   want seriously because that channel subsystem is _complicated_ and very
> >   few of us fully understands it afaict.
> >
> >   Which would bring us back to (which is btw basically what we have now
> >   considering the channel queues are useless):
> >
> >     conn inbuf -> circ queue -> conn outbuf
> >
> >   If we don't want to get rid of channel, the fixes are non trivial. For
> >   starter, we have to decide if we want to keep the channel queue or not
> >   and if yes, we need to almost start from square 1 in terms of testing
> >   because we would basically introduce a new layer of queuing cells.
> >
> 
> So, this is the question I'm least sure about. Please take the following as
> tentative.
> 
> I think that the two choices ("refactor channels" and "rip out channels")
> may be less different than we think. Neither one is going to be trivial to
> do, and we shouldn't assume that sticking everything together into one big
> type will actually make the code _simpler_.
> 
> The way I think about the code right now, "channel" is an interface which
> "connection_or" implements, and there is no meaningful barrier between
> connection_or and channeltls.  I _do_ like the idea of keeping some kind of
> abstraction barrier, though: a "channel" is "whatever we can send and
> receive cells from", whereas an "or_connection" has a lot of other baggage
> that comes with it.
> 
> From my POV, we *should* definitely abolish the channels' queues, and
> minimize the amount of logic that channels do on their own. I'm not sure if
> we should rip them out entirely, or just simplify them a lot. I don't think
> either necessarily simpler or less bug-prone than the other.
> 
> Perhaps we should sketch out what the new interface would look like?  Or
> maybe do an hour or two worth of exploratory hacking on each approach?

I tend to agree here with you. I did the exercise already to remove the
channel queues and I ran a relay for weeks on it without any apparent
problems. It removed 1000+ lines and made everything simpler.

So, I do think having the "channel" abstraction is good and both in terms of
future use of channels but also in terms of maintanability.

That being said, I think the next move for me will be to act on removing the
channel queues and try to decouple channeltls and connection_or so we can end
up with a real abstraction. Achieving this will provide us some more
encapsulation between subsystem and thus be able to provide strong guarantees
between them a.k.a between channel and scheduler.

I do think the current channel interface is "ok" for now but as we do this
exercise and refactoring, we'll discover more things which will just improve
the subsystem overall.

> 
> (This reminds me of another change I want someday, which is splitting
> edge_connection_t into an "edge_connection" type that implements a "stream"
> interface: right now, we have quite a few streams that aren't actually edge
> connections, but which use the type anyway.)
> 
> * Dealing with the DESTROY cell design issue will require a bit more tricky
> >   work I think. We must not starve circuit with a DESTROY cell pending to
> >   be sent else the other side keeps sending data. But we should also not
> >   starve all the circuits because if we ever need to send a gazillion
> >   DESTROY cell in priority, we'll make the relay useless (DoS vector).
> >
> >   The question is, do we trust our EWMA policy to be wise enough to pick
> >   the circuit in a reasonable amount of time so we can flush the DESTROY
> >   cell from the circuit queue? Or we really need to bypass or prioritize
> >   somehow that cell in order to send them asap in order to avoid load on
> >   the network because the other side of the circuit is still sending?
> >
> 
> So, elsewhere in the thread, folks have been discussing whether a circuit
> that's going to send a DESTROY cell should flush its pending cells first.
> 
> The answer is sadly, "it depends".
> 
> Case 1: Suppose Alice is downloading a webpage.  Suppose we are the middle
> relay and we lose our connection to the exit.  It would be nice to keep
> flushing the data we have towards Alice -- maybe.  If she can use partial
> data.  But any data that Alice sends to us would be lost, so it would be
> good if we had some way to tell Alice "stop sending please".
> 
> Case 2: Suppose Alice is uploading something to a webserver. Suppose we are
> the middle relay and we lose our connection from Alice. In this case,
> there's no point in sending any more data towards the webserver before we
> send it a DESTROY cell.  (Even if Alice was in the middle of a big upload,
> she'll need to repeat any part of it that wasn't ACKed, since she won't
> know what was received and what wasn't.)
> 
> Case 3: Suppose we hit our OOM killer.  In this case, we had better discard
> all the data on the circuit we're killing, or we're vulnerable to "sniper
> attacks" again.
> 
> So it's clear that sometimes we should dump the data, and sometimes we
> shouldn't.  I think this is an independent question from what we're asking
> here.  (My own take is that solving case 1 right requires "RELAY_TRUNCATED"
> cells, which I believe we don't implement today.)

For the record, right now in tor, whatever (1) or (2), when the middle relay
decides it needs to send a DESTROY, circuit queue are cleared (nothing is
sent) and a DESTROY is sent very fast.

> 
> What we're asking here is: how can we reintegrate DESTROY cells with the
> rest of the scheduler logic?
> 
> I think that, from a priority POV, DESTROY cells are in a certain sense the
> _opposite_ of traffic, and we might actually want to treat them differently
> from data cells.  Consider that if we have a choice between DESTROYing a
> busy circuit or a quiet one, we will save more bandwidth by destroying the
> busy circuit first, so that no more data is sent to us over it.
> 
> On the other hand, this doesn't mean that the FIFO structure we have today
> is a good idea at all.  It probably makes sense to use the same priority
> queue-based scheduler thing that we use everywhere else, but possibly with
> a different (inverted??) priority parameter for destroyed circuits.

(We kind of need the FIFO concept for cells afaict because of the parent
relationship between cells with their digest (à la git). And that is of course
per circuit.)

> 
> One more thing to be aware of: the destroy_cell_queue exists in part
> because we tear down circuits at the same time that we queue their destroy
> cells.  If we changed Tor so that "destroyed" circuits were kept around
> somehow until their cells could be sent, then we'd be introducing a new
> state to our state machine, to represent circuits that were schedulable but
> not actually usable for traffic.  We'd need to be careful to handle that
> correctly: this kind of "unusable object that still exists" has caused us
> problems before.   (The solution I like best for avoiding this confusion is
> to make it so the scheduler can schedule two types of "schedule-able"
> things: circuits, and "pending destroy cells".)

After some thoughts, yes I do think you are right here. For the two reasons
you mentionned:

1. DESTROY cell needs to be considered differently in terms of priority
   depending on the scheduler. KIST for instance, we might think that we want
   to bypass the kernel limit and just dump it asap. But we would also need to
   select the active circuit using a different priority policy (inverted) as
   you suggested to prioritize busy circuit.

2. We can't keep circuit objects alive while we wait for the destroy cell, the
   logic inside tor might become complicated and kind of also opens a way for
   memory DoS if we ever have a DESTROY cell starvation problem in tor.

Then the problem becomes how do those two queues interact with each other
(normal and DESTROY cells). The issue at hands becomes possible starvation.

Right now, Tor does a back and forth between sending a DESTROY cell and a
normal cell (regardless of the circuit, actually never for the same circuit).
The destroy cell queue is a FIFO which is not ideal with what we've discussed
but overtime we'll make sure to avoid starvation because DESTROY cells aren't
picked by priority.

Thus, if the DESTROY queue becomes a priority queue, starvation becomes a
problem. And extra difficulty, the priority in the normal cell queue can't be
compared to the one of the destroy cell queue because the former evolves over
time and the later is frozen in time because the circuit simply doesn't exists
anymore. In other words, we can't consider the two queues' priorities within
the same policy.

One avenue of solution to this is having some sort of "throtlling" or a
"threshold" that the scheduller allows a queue to be emptied so the other
queue doesn't starve. Chances are that the threshold in our cases would be a
maximum number of cells before the destroy cell queue yields its time back to
the normal cell queue which also would need a threshold. And to be efficient
there, good chance that threshold would be adaptative of the load of the
relay.

All in all, this ain't an easy problem so simply switching the DESTROY cell
queue to a priority one will need important changes.

We are getting in the land of proposal so I'll stop for now, do some more
homework in the starvation problem in scheduling (vis-a-vis routing) and write
something up in a proposal so we can go from there. But at least now, we've
come up with some agreement and more data points on how DESTROY cells needs to
be handled in Tor.

> 
> > * In the short term, we should get rid of Vanilla scheduler because it
> >   complefixies a lot the scheduler code by adding uneeded things to
> > channel_t
> >   but also bloated the scheduler interface with pointless function
> > pointers for
> >   instance. And in my opinion, it is not helping performance the way it is
> > done
> >   right now.
> >
> 
> I agree with Roger here: it's fine to throw away the vanilla scheduler, but
> we should wait until KIST has been running unproblematically in a stable
> release for a while.  0.3.4 seems like a good time for this.

Agree.

Thanks!
David

> 
> -- 
> Nick

> _______________________________________________
> tor-dev mailing list
> tor-dev at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

-- 
Y7bJ1MNf9h5n1CLO6oGz6GAWskRPqhTOELJMs7AdqGg=
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20171116/b1303776/attachment.sig>