[tor-bugs] #24665 [Core Tor/Tor]: sched: In KIST, the extra_space kernel value needs to be allowed to be negative

Tue Dec 19 20:28:37 UTC 2017

#24665: sched: In KIST, the extra_space kernel value needs to be allowed to be
negative
------------------------------+--------------------------------
     Reporter:  dgoulet       |      Owner:  dgoulet
         Type:  defect        |     Status:  assigned
     Priority:  Very High     |  Milestone:  Tor: 0.3.2.x-final
    Component:  Core Tor/Tor  |    Version:  Tor: 0.3.2.1-alpha
     Severity:  Normal        |   Keywords:  tor-sched
Actual Points:                |  Parent ID:
       Points:                |   Reviewer:
      Sponsor:                |
------------------------------+--------------------------------
 KIST, when updating the TCP socket information, computes a limit of bytes
 we are allowed to write on the socket of the given active channel.

 First, the `tcp_space` tells us how much TCP buffer space we have in the
 kernel for this socket. The computation is below. I encourage anyone to go
 read the comment in `update_socket_info_impl()` to know more about the
 why:

 {{{
 tcp_space = (ent->cwnd - ent->unacked) * (int64_t)(ent->mss);
 }}}

 After that, we compute some `extra_space` to be used to give the kernel a
 bit more data so when the ACK comes back from the packets sitting in the
 `tcp_space`, it can then take some in that extra space and doesn't have to
 wait on the scheduler to feed more data. Here is how it is computed:

 {{{
 extra_space =
   clamp_double_to_int64(
     (ent->cwnd * (int64_t)ent->mss) * sock_buf_size_factor) -
 ent->notsent;
 }}}

 It uses the `notsent` value which is the size of the queue in the kernel
 with data *not* sent so the data in there is not reflected in the
 `unacked` value because they haven't been sent yet on the wire.

 That queue can be large, someimtes bigger than the `tcp_space` we computed
 above because the congestion window moves over time and the kernel can
 move as much as its want from the congestion windows into the output
 queue, that is the TCP stack black magic. On minute the cwnd = 10 and the
 other it is 67.

 If `extra_space` becomes negative because `notsent` is bigger than the
 current congestion window, this means that the regular `tcp_space` needs
 to shrink down. Right now, we just add the extra_space if it is positive
 but the reality is that the current tcp space needs to consider the
 `notsent` size also.

 Bottom line, if `tcp_space + extra_space` end up < 0, the allowed limit
 needs to be `0` and not what `tcp_space` is.

 We've been able to find this issue while looking at very loaded relays
 that kept putting data in the outbuf while the connection socket was not
 ready to write. We realized that the `notsent` queue was huge but still
 KIST was allowing more bytes to be written over and over again filling the
 outbuf at a rapid rate and thus the memory.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/24665>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online