[tor-dev] Regarding Metrics project
Karsten Loesing
karsten.loesing at gmx.net
Thu Nov 17 06:51:40 UTC 2011
Hi Qiang,
I'm cc'ing tor-dev, because I think your original request to work on
Java codebases went there.
On 11/17/11 2:13 AM, Qiang Wang wrote:
> This is Qiang, and I am very interested in Tor project, and want to
> contribute something. I was wondering if I can provide any help about
> metrics project?
>
> Could you please let me know your recommendations?
Sure thing. Three ideas come to mind:
- Improve the consensus-health script: We have a Java program that
downloads the network statuses from all eight directory authorities and
compares them to each other to detect problems. One of the outputs is a
web page [0], another output is a status email [1], and a third is a
message sent to an IRC bot [2]. The code [3] needs some love.
- Answer the question what fraction of exit relays exit from a different
IP than is in their descriptor. This is a typical question that can be
answered using the metrics data [4] we have. We'd want to publish the
analysis code in the metrics-tasks repository [5], so that others can
reproduce the results or refine the analysis. There's already a lot of
Java code in that repository, because that's what I use when analyzing
metrics data. We have plenty of other analysis questions similar to
this one, so if you don't like this one, take a look at the Analysis
component in our bug tracker [6].
- Implement an efficient relay-search database. We have a web site for
searching relays by IP, nickname, or fingerprint [7], but it's really
slow. The main problem is that the database originally was designed for
aggregating statistics about relays, not for searching relays. I have
two ideas here: The first is to design a separate PostgreSQL database
[8] and go crazy with indexes, the second is to try out CouchDB for this
[9]. This task isn't really that Java-specific, except for the fact
that I'm using Java to import data and send queries.
If anything sounds interesting to you, please let me know. I have more
information about these tasks and can help you get started. Also look
at the research section of the metrics website [10] to learn more.
Best,
Karsten
[0] https://metrics.torproject.org/consensus-health.html
[1]
https://lists.torproject.org/pipermail/tor-consensus-health/2011-November/000026.html
[2] See the IRC bot nsa in #tor-bots on OFTC, e.g., "< nsa> or:
[consensus-health] The following directory authorities set conflicting
or invalid consensus parameters: ides bwauthbestratio=1 bwauthcircs=0
bwauthdescbw=1 bwauthkp=10000 bwauthpid=1 bwauthtd=0 bwauthti=0
bwauthtidecay=5000 cbtnummodes=3 refuseunknownexits=1"
[3]
https://gitweb.torproject.org/metrics-web.git/tree/HEAD:/src/org/torproject/chc
[4] https://metrics.torproject.org/data.html
[5] https://gitweb.torproject.org/metrics-tasks.git
[6]
https://trac.torproject.org/projects/tor/query?status=!closed&component=Analysis&order=priority
[7] https://metrics.torproject.org/relay-search.html
[8] https://trac.torproject.org/projects/tor/ticket/2922
[9] https://trac.torproject.org/projects/tor/ticket/4440
[10] https://metrics.torproject.org/research.html
More information about the tor-dev
mailing list