[tor-bugs] #31422 [Circumvention/BridgeDB]: Make BridgeDB report internal metrics
Tor Bug Tracker & Wiki
blackhole at torproject.org
Thu Jun 4 09:14:11 UTC 2020
#31422: Make BridgeDB report internal metrics
-------------------------------------------------+-------------------------
Reporter: phw | Owner: phw
Type: enhancement | Status:
| needs_review
Priority: Medium | Milestone:
Component: Circumvention/BridgeDB | Version:
Severity: Normal | Resolution:
Keywords: s30-o21a1, anti-censorship- | Actual Points:
roadmap-2020 |
Parent ID: #31274 | Points: 2
Reviewer: | Sponsor:
| Sponsor30-can
-------------------------------------------------+-------------------------
Comment (by karsten):
Replying to [comment:12 phw]:
> I think it's time for a review of what I've done so far:
> https://github.com/NullHypothesis/bridgedb/compare/enhancement/31422
I took a brief look at the new metrics captured by your patch:
> Here are the internal metrics that the patch is currently capturing:
> * Number of IPv4/IPv6 requests.
You're already counting lots of requests and reporting binned numbers, so
this should be fine.
> * Min, max, median, and stdev of the number of users that bridges were
handed out to.
I don't see any privacy issues with computing and reporting these four
statistics.
I'm less sure about how useful they will be. The median will likely be the
most interesting statistic here, but the min and max will only tell you
about the smallest and largest outliers but not tell you much about how
the distribution looks like. Not sure how useful the standard deviation
will be.
Would it be an option to add quantiles? Your comment suggests that you'd
have to require Python 3.8 in order to use the quantiles() function of the
built-in statistics module. But did you consider using SciPy/NumPy to
compute these? However, if neither of those is an option, I'd recommend
against computing quantiles yourself, because there are just too many ways
to screw up.
If you have quantiles, you might want to include first and third quartile
as well as smallest and largest non-outliers within 1.5 inter-quartile
ranges from the median. That's the five values you'd also find in a
boxplot. We're computing these five values in our
[https://metrics.torproject.org/onionperf-latencies.html OnionPerf latency
statistics]. [https://gitweb.torproject.org/metrics-
web.git/tree/src/main/sql/onionperf/init-onionperf.sql#n187 Here]'s the
SQL code that we use. (I don't think we have Python code around for
computing the high and low values.)
If you want to start with somewhat simpler statistics, be sure to include
first and third quartile together with the median. You could always add
the high and low values later if you need them.
> * The number of empty responses per distributor.
> * The number of bridges per (sub)hashring.
Like the first number, I don't see an issue with reporting these binned
numbers.
> In the meanwhile, I'll spend some more time thinking about the other
metrics suggestions in this ticket.
Let me know if you want me to take another look!
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/31422#comment:13>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list