[tor-bugs] #25153 [Core Tor/Tor]: Specify how PrivCount in Tor statistics are configured and interpreted

Thu Apr 19 04:37:26 UTC 2018

#25153: Specify how PrivCount in Tor statistics are configured and interpreted
-------------------------------------------------+-------------------------
 Reporter:  teor                                 |          Owner:  teor
     Type:  task                                 |         Status:
                                                 |  assigned
 Priority:  Medium                               |      Milestone:  Tor:
                                                 |  unspecified
Component:  Core Tor/Tor                         |        Version:
 Severity:  Normal                               |     Resolution:
 Keywords:  needs-proposal, privcount,           |  Actual Points:
  034-triage-20180328, 034-removed-20180328      |
Parent ID:  #22898                               |         Points:  5
 Reviewer:                                       |        Sponsor:
                                                 |  SponsorQ
-------------------------------------------------+-------------------------

Comment (by teor):

 I think I have worked out how we can do PrivCount in Tor version upgrades.

 We can safely add counters to an existing PrivCount in Tor deployment if
 we increase the noise added by older versions (or turn them off if they
 are too old). If we specify a noise ratio for each stats version in the
 consensus, then we can increase the noise in older versions each time we
 release a new version. If we removed counters, we would just accept the
 excess noise (which is safe), or not increase the noise as much when we
 add the next counter.

 Two further thoughts on PrivCount in Tor:

 1. Each counter we add will add more noise. To get the same relative
 noise, we can add more collecting relays. I wonder if we can calculate how
 much consensus weight we need to add per counter.

 2. We can discover relays that submit out of range data without
 specialised oblivious computation:
 a) the tally reporters do an aggregation round where each relay is in its
 own partition
 b) to protect relays which actually reported statistics, each tally
 reporter adds enough noise to hide the largest valid relay
 c) if a relay's total is large enough (4 or 5 sigmas away from the mean,
 positive or negative), it is probably bad. 4 sigmas is 1 in 16,000, or a
 38% probability of a false positive for 6000 relays. So we might need 5
 sigma if we ever deploy to the whole network. See
 https://en.m.wikipedia.org/wiki/Normal_distribution#Standard_deviation_and_coverage

 If we ran out of bounds detection before every aggregation, we could use
 it to detect grossly inadequate noise. Which would make it safe to collect
 protocol warnings even if we get the action bounds (expected activity for
 a single client) wrong.

 But per-relay partitions enable an attack where you push a client's
 activity on a guard over the threshold, so the guard is excluded from the
 stats. We could set the exclusion threshold so high that it's physically
 impossible to have that many events.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25153#comment:7>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online