[metrics-bugs] #9316 [Obfuscation/BridgeDB]: BridgeDB should export statistics
Tor Bug Tracker & Wiki
blackhole at torproject.org
Tue Apr 23 20:12:45 UTC 2019
#9316: BridgeDB should export statistics
-------------------------------------------+---------------------------
Reporter: asn | Owner: dgoulet
Type: task | Status: assigned
Priority: Medium | Milestone:
Component: Obfuscation/BridgeDB | Version:
Severity: Normal | Resolution:
Keywords: metrics, bridgedb, prometheus | Actual Points:
Parent ID: #19332 | Points: 3
Reviewer: | Sponsor: Sponsor19
-------------------------------------------+---------------------------
Comment (by dcf):
Replying to [comment:16 phw]:
> Here's a preliminary list of statistics that we may want, and why we
want them. Needless to say, we need to figure out how to collect these
statistics safely.
If it's possible, I would like to have a guess at what fraction of bridge
requesters are bots. Proxy-distribution papers usually assume that an
adversary controls some fraction of the users--it would be great to know
what the fraction is in this case. For example
[https://censorbib.nymity.ch/#Mahdian2010a Mahdian2010a] "''n'' users,
''k'' of whom [are] adversaries," [https://censorbib.nymity.ch/#Wang2013a
Wang2013a] "Let ''f'' denote the fraction of malicious users among all
potential bridge users.... We expect a typical value of ''f'' between 1%
and 5%...."
Here are some possible ways to identify bots:
* IP address clustering--for example if BridgeDB considers all addresses
in a /24 the same, find the most commonly occurring /20
* auto-generated email addresses following a pattern
* to start, you could make a histogram of the lengths of email
addresses, and see if it's concentrated at a single point. or count the
frequency of short prefixes and suffixes of email address local-parts, and
see if there are any that appear overwhelmingly more often than others.
* an anachronistic HTTP User-Agent (for example, Chrome from 2 years ago,
when most real Chrome users auto-update)
* inconsistent HTTP headers, for example Chrome or Firefox without
`Accept-Encoding: gzip`
With some sort of bot-classification heuristic, then it would be good to
analyze the statistics you mentioned already (e.g. fraction
allowed/denied) for bot and non-bot requests.
I would like to see a graph that shows how long it takes for a single
bridge to be given to ''n'' different requesters. When BridgeDB starts
distributing a bridge, how long does it take before 5 people know about
it? Before 50 people know about it?
> * Approximate number of ''HTTPS'' requests coming from proxies.
> * This may be an indicator of people trying to game the system.
On this point, specifically I would want to know what fraction of of
requests have an `X-Forwarded-For` or `Via` header, ''and'' how many
entries it contains. I mention this because not only can these headers
indicate the use of a proxy, a client may spoof them. And I seem to
remember that BridgeDB may process `X-Forwarded-For` incorrectly, like it
reads the entries in the wrong order when there are multiple of them.
For this analysis, you will have to be aware that requests via Moat always
have at least one `X-Forwarded-For` (I believe), because Moat is
implemented using an Apache `ProxyPass` reverse proxy and Apache adds that
header.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/9316#comment:19>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list