[tor-bugs] #25141 [Core Tor/Tor]: enabling CellStatistics results in gigabytes of incremental memory consumption
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed Feb 7 20:30:46 UTC 2018
#25141: enabling CellStatistics results in gigabytes of incremental memory
consumption
-------------------------------------------------+-------------------------
Reporter: starlight | Owner: (none)
Type: defect | Status: new
Priority: High | Milestone: Tor:
| 0.3.4.x-final
Component: Core Tor/Tor | Version: Tor:
| 0.3.2.9
Severity: Major | Resolution:
Keywords: 031-backport, 029-backport, ddos, | Actual Points:
must-fix-before-033-stable |
Parent ID: #24806 | Points: 1
Reviewer: | Sponsor:
-------------------------------------------------+-------------------------
Comment (by robgjansen):
TL;DR
These stats are useful for traffic modeling. I think a fine plan is to
recommend to phase out cell statistics, with the goal of replacing them in
the long term with more useful privacy-preserving measurements using the
new PrivCount protocol when its ready, such as bytes-per-stream and
streams-per-circuit.
Data point about RAM usage:
I've had the `CellStatistics` option enabled on my fast relays for a
couple of years now. I've noticed that RAM usage on any of the relays
would occasionally increase above 2 GiB (even when I temporarily
experimented with `MaxMemInQueues 512 MB`), but I never linked it to
`CellStatistics`. I've now disabled `CellStatistics` because RAM has
recently become a bit of a problem for other reasons.
Do we still care about cell statistics?
Theoretically, cell statistics can help us understand how well our test
networks model the conditions of the real Tor network, which would be
useful for test networks that actually try to faithfully model the
conditions of the real Tor network (like [https://shadow.github.io
Shadow]). For example, we could run a bunch of clients in the test network
that are initiating stream transfers according to some understanding of
client usage, and then we collect the cell stats on the relays in the test
network and compare them to the stats from the public network. How close
we are provides a measure of fidelity. I actually did this in the original
[http://www.robgjansen.com/publications/shadow-ndss2012.pdf Shadow paper].
However, circuit-level cell information by itself is not the greatest for
informing how the clients in the test network should generate traffic in
the first place. For that, we need a combination of stream-level and
circuit-level information. From a client modeling perspective, I need to
know how many streams each client should create, how much to download on
each stream, how many pauses to add and how long to pause, etc. Circuits
and cells are a second-level effect of the client model.
The privacy implications are also worth considering (some concerns raised
[https://lists.torproject.org/pipermail/metrics-
team/2016-January/000057.html here], and moved
[https://lists.torproject.org/pipermail/tor-dev/2016-February/010398.html
here]). I think individual relay stats are much more specific that what we
need; for modeling purposes, we lose very little utility by using
aggregate network results, but they are much safer.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25141#comment:12>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list