[metrics-bugs] #9316 [Circumvention/BridgeDB]: BridgeDB should export statistics
Tor Bug Tracker & Wiki
blackhole at torproject.org
Tue Jun 11 23:08:05 UTC 2019
#9316: BridgeDB should export statistics
-------------------------------------------------+-------------------------
Reporter: asn | Owner: phw
Type: task | Status:
| assigned
Priority: Medium | Milestone:
Component: Circumvention/BridgeDB | Version:
Severity: Normal | Resolution:
Keywords: metrics, bridgedb, prometheus, ex- | Actual Points:
sponsor-19, anti-censorship-roadmap |
Parent ID: #19332 | Points: 3
Reviewer: | Sponsor:
| Sponsor30-must
-------------------------------------------------+-------------------------
Comment (by phw):
We just heard back from Tor's Research Safety Board. You can find the
response below. The reviewer writes that our proposal wouldn't be an issue
in a one-off setting but could be problematic in the long run. I think a
reasonable way forward would be to implement the proposal, run it in a
one-off setting for, say, a week, and then evaluate if we should change
data collection. In the long run, we should also transition to PrivCount
as the reviewer mentions.
{{{
Tor Research Safety Board Paper #20 Reviews and Comments
===========================================================================
Paper #20 Collecting BridgeDB usage statistics
Review #20A
===========================================================================
* Updated: 11 Jun 2019 6:02:53pm EDT
Overall merit
-------------
4. Accept
Reviewer expertise
------------------
3. Knowledgeable
Paper summary
-------------
The document proposing collecting a new set of usage statistics through
data
available from the operation of BridgeDB. The statistics would be useful
for
better prioritizing development tasks, to improve reaction time to bridge
enumeration attacks and blockages, to reduce failure rates, and to help
promote
censorship circumvention research.
Comments for author
-------------------
If this was a short term study, I would say go for it, no questions asked.
The
benefits are clear and I agree that they outweigh the risks.
However, I think it was implied (although not explicitly stated) that the
new
statistics would be regularly collected and published on an ongoing basis.
I
think there are more risks associated with such an ongoing collection as
opposed
to a one-off or short term study, so we should carefully consider the
trade-offs
between cost/effort of safer collection methods with the privacy benefits
of
such methods.
The most concerning statistics to me are the per-country statistics and
the
per-service (gmail, yahoo, etc.) statistics. I think it is clear from
Sections 3
and 4 that you understand the risks associated with collecting these
statistics:
a single user from an unpopular country could be identified because the
1-10
bucket suddenly changed from a 0 count to a 1 count. This issue might also
exist
if unpopular email service providers are selected. This issue is already
present
in Tor's per-country user statistics, and I believe there is a plan to
transition away from these statistics because of the safety concerns. The
bucketing proposal (round to the nearest 10) does provide some
uncertainty, but
it's hard to reason about what protection it is providing.
In an ideal world, we would collect these statistics with a privacy-
preserving
statistics collection tool. In fact, I think most if not all of these
could be
collected with PrivCount (assuming it was extended to support the new
event
types).
One useful thing about PrivCount is secure aggregation, meaning that if
you have
multiple data collectors, you can securely count a total across all of
them
without leaking individual inputs. In this case, it seems like there is
only one
BridgeDB data source, so we woud not benefit from PrivCount's secure
aggregation.
The other useful thing that PrivCount provides is differential privacy.
This is
where you could get most of the benefit. Rather than rounding to 10 and
not
knowing how much privacy that provides, you instead start by defining how
much
privacy each statistic should achieve based on your operational
environment
(these are called action bounds), and then PrivCount will add noise to the
statistics in a way that will guarantee differential privacy under those
constraints. If these constraints add too much noise for the resulting
statistics to be useful, then you have to consider if the measurement is
too
privacy-invasive for the given actions you are trying to protect and
therefore
you possibly shouldn't collect them.
Tor has PrivCount on the roadmap (I believe), so one option could be to
implement the non-PrivCount version now and eventually transition the
statistics
to PrivCount. Another option would be to set up a PrivCount instance using
the
open source tool rather than waiting for the PrivCount-in-Tor version to
be
ready. In fact, if the data is collected at BridgeDB, then I'm not sure
that
having PrivCount in Tor would help anyway (unless the BridgeDB runs Tor).
There has been some work to use PrivCount for measurement and also to
explain
the process of defining action bounds. I think the most relevant is the
IMC
paper:
- https://torusage-imc2018.github.io
}}}
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/9316#comment:25>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list