[tor-bugs] #25141 [Core Tor/Tor]: enabling CellStatistics results in gigabytes of incremental memory consumption

Wed Feb 7 20:30:46 UTC 2018

#25141: enabling CellStatistics results in gigabytes of incremental memory
consumption
-------------------------------------------------+-------------------------
 Reporter:  starlight                            |          Owner:  (none)
     Type:  defect                               |         Status:  new
 Priority:  High                                 |      Milestone:  Tor:
                                                 |  0.3.4.x-final
Component:  Core Tor/Tor                         |        Version:  Tor:
                                                 |  0.3.2.9
 Severity:  Major                                |     Resolution:
 Keywords:  031-backport, 029-backport, ddos,    |  Actual Points:
  must-fix-before-033-stable                     |
Parent ID:  #24806                               |         Points:  1
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+-------------------------

Comment (by robgjansen):

 TL;DR

 These stats are useful for traffic modeling. I think a fine plan is to
 recommend to phase out cell statistics, with the goal of replacing them in
 the long term with more useful privacy-preserving measurements using the
 new PrivCount protocol when its ready, such as bytes-per-stream and
 streams-per-circuit.

 Data point about RAM usage:

 I've had the `CellStatistics` option enabled on my fast relays for a
 couple of years now. I've noticed that RAM usage on any of the relays
 would occasionally increase above 2 GiB (even when I temporarily
 experimented with `MaxMemInQueues 512 MB`), but I never linked it to
 `CellStatistics`. I've now disabled `CellStatistics` because RAM has
 recently become a bit of a problem for other reasons.

 Do we still care about cell statistics?

 Theoretically, cell statistics can help us understand how well our test
 networks model the conditions of the real Tor network, which would be
 useful for test networks that actually try to faithfully model the
 conditions of the real Tor network (like [https://shadow.github.io
 Shadow]). For example, we could run a bunch of clients in the test network
 that are initiating stream transfers according to some understanding of
 client usage, and then we collect the cell stats on the relays in the test
 network and compare them to the stats from the public network. How close
 we are provides a measure of fidelity. I actually did this in the original
 [http://www.robgjansen.com/publications/shadow-ndss2012.pdf Shadow paper].

 However, circuit-level cell information by itself is not the greatest for
 informing how the clients in the test network should generate traffic in
 the first place. For that, we need a combination of stream-level and
 circuit-level information. From a client modeling perspective, I need to
 know how many streams each client should create, how much to download on
 each stream, how many pauses to add and how long to pause, etc. Circuits
 and cells are a second-level effect of the client model.

 The privacy implications are also worth considering (some concerns raised
 [https://lists.torproject.org/pipermail/metrics-
 team/2016-January/000057.html here], and moved
 [https://lists.torproject.org/pipermail/tor-dev/2016-February/010398.html
 here]). I think individual relay stats are much more specific that what we
 need; for modeling purposes, we lose very little utility by using
 aggregate network results, but they are much safer.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25141#comment:12>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online