[tor-bugs] #14453 [BridgeDB]: Implement statistics gathering for number of Bridges-per-Transport in BridgeDB
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed Jan 28 23:35:54 UTC 2015
#14453: Implement statistics gathering for number of Bridges-per-Transport in
BridgeDB
---------------------------------------------+----------------------
Reporter: isis | Owner: isis
Type: task | Status: new
Priority: normal | Milestone:
Component: BridgeDB | Version:
Keywords: tor-bridge,bridgedb,SponsorS-pt | Actual Points:
Parent ID: | Points:
---------------------------------------------+----------------------
As part of the
[https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorS/PluggableTransports
SponsorS PT work], we promised a way to gather statistics on the number of
bridges per transport.
The proposal states this is a task for Metrics. However, it's possible to
do this on the BridgeDB side. In fact, it would help BridgeDB in the
future to determine how to better allocate bridges to its Distributors
(and help the Distributors hand them out to users in smarter ways).
Technically, BridgeDB already sort-of has data on the number of Bridges-
per-Transport… or, rather, when a client requests a certain type of bridge
from a certain Distributor (e.g. "give me an IPv4 obfs3 bridge from the
HTTPS Distributor"), BridgeDB creates (or retrieves from a cache) a
"filtered" subhashring containing only Bridges which fit the client's
request. BridgeDB even logs the number of Bridges in these subhashrings in
its DEBUG and INFO logs:
{{{
22:19:16 INFO L1361:Bridges.addRing() Bridges inserted into
HTTPS-Transpo subring: 235
22:19:16 DEBUG L75:Dist.getNumBridgesPerA() Returning 3 bridges from
ring of len: 235
}}}
The problem with using those numbers for statistics is that BridgeDB's
Distributors may have multiple adjacent subhashrings, usually about 5. So,
in the above case, there's roughly something like 1175=5*235 obfs3 bridges
in the HTTPS Distributor. (These numbers aren't from the real deployed
BridgeDB, by the way.)
---------
A better way to do this would be to provide a database query (as part of
#12031) which counts the number of Bridges which claim to offer a PT. An
example mechanism for doing this in Redis would be to keep a hash (i.e.
using [http://redis.io/commands/hset HSET] or `HINCRBY`) of Bridges which
have any PTs, where the keys are the Bridge fingerprints, add a field for
each type of PT, and then (if not using `HINCRBY`) store
`IP:PORT[,IP:PORT[,IP:PORT[…]]]`, for example:
{{{
redis> HSET 26F6A7570E0F655DFDD054E79ACBB127112C2D7B obfs4
"4.4.4.4:4444,5.5.5.5:5555"
}}}
With that scheme, a new `HSET` would be necessary each time the `@type
bridge-extrainfo` descriptors are parsed, but this only has time
complexity O(1).
Some considerations / additional query parameters:
* For these statistics, should we only count Bridges with the Running
flag? Or only if the OONI machine says the PT is reachable?
* What sanitisations should be done on these numbers? Should we round
them? Or provide a scale, i.e. "between 1000-5000 obfs4 bridges"?
* Do we want only the ''Bridges'' with a given PT? Or do we want the
''number of instances'' of a given PT (e.g. if a Bridge has multiple obfs3
instances)?
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/14453>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list