[tor-bugs] #2680 [Metrics]: present bridge usage data so researchers can focus on the math
Tor Bug Tracker & Wiki
torproject-admin at torproject.org
Fri Mar 11 14:06:58 UTC 2011
#2680: present bridge usage data so researchers can focus on the math
---------------------+------------------------------------------------------
Reporter: arma | Owner: karsten
Type: task | Status: assigned
Priority: normal | Milestone:
Component: Metrics | Version:
Keywords: | Parent:
Points: | Actualpoints:
---------------------+------------------------------------------------------
Changes (by karsten):
* status: new => assigned
Comment:
Here's my first attempt for presenting bridge usage data in a way that is
more useful to researchers:
We have (at least) four data sources that are relevant for analyzing
bridge usage:
1. ''Bridge descriptors'': Bridges publish bridge descriptors to the
bridge authority at least once every 18 hours.
2. ''Bridge network statuses'': The bridge authority forms an opinion on
all bridges that published a descriptor recently, decides whether it
considers them as running, and writes these opinions to a bridge network
status document every 30 minutes.
3. ''BridgeDB pool assignments'': BridgeDB learns about currently
running bridges from the bridge authority and allocates these bridges to
distributors like email or https or keeps them unallocated for manual
distribution.
4. ''Relay consensuses'': The directory authorities vote on running
relays (not bridges) every hour and publish a network status consensus.
If a bridge uses the same identity key that it also used as a relay, it
might observe more users than it would observe as a pure bridge.
Therefore, bridges that have been running as relays before should be
excluded from bridge statistics.
When Roger and I talked about this idea on IRC, I thought that we could
merge data from these 4 sources into a single file. Let's step back. We
should start with 4 data formats that are easier to parse than the current
data sources and let researchers assemble the files themselves. We can
discuss merging these 4 data formats into 1 at a later time.
I wrote two Java programs to parse the data on the metrics website and
generate 3 of these 4 data formats. (We're still in the process of
patching BridgeDB to dump its pool assignments to a file for the 3rd data
source in the list above. Once we're done with that, I'll write another
Java program to provide the 4th data format.) I can integrate these
programs into metrics-db and provide these formats on a daily basis, but
before doing so, I'd like to know whether the formats are useful to people
at all.
I uploaded a tarball of the three new data formats for
[http://freehaven.net/~karsten/volatile/bridge-usage-data-2011-01.tar.bz2
January 2011] (39M). The source code to transform our standard tarballs
into the new data formats plus a more detailed description of the data
formats is in the [https://gitweb.torproject.org/metrics-
tasks.git/tree/HEAD:/task-2680 metrics-tasks repository].
I'm going to make the 3rd data format (BridgeDB pool allocations) for
January 2011 available as soon as I have it (hopefully in a week from
now).
Also, I'm going to ignore the research questions listed in the ticket
description above and let others answer them.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2680#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list