[tor-bugs] #4439 [Metrics Utilities]: Develop a Java/Python API that wraps relay descriptor sources and provides unified access to them
Tor Bug Tracker & Wiki
torproject-admin at torproject.org
Tue Nov 8 09:35:49 UTC 2011
#4439: Develop a Java/Python API that wraps relay descriptor sources and provides
unified access to them
-------------------------------+--------------------------------------------
Reporter: karsten | Owner: karsten
Type: task | Status: new
Priority: normal | Milestone:
Component: Metrics Utilities | Version:
Keywords: | Parent:
Points: | Actualpoints:
-------------------------------+--------------------------------------------
Quite a few metrics tools are processing archived and current relay
descriptors to provide aggregate statistics, make descriptor archives
searchable, or monitor the Tor network. These tools have a non-trivial
amount of code in common that imports relay descriptors from various
sources. Copying code is bad. Let's write an API that all these metrics
tools can use and that facilitates developing new tools.
Note that this API is different from existing Tor controller APIs which
connect to a Tor's control port and provide descriptors that the Tor
process knows about. The new API won't connect to a Tor control port
(even though it would be possible, but it's not required), but it may read
the cached descriptors from a Tor's data directory, along with importing
relay descriptors from other sources. Of course, the two APIs can be
combined, but there's also a reason for the API described here to exist
separately. None of the metrics tools requires to control a Tor process.
There are two major sources for relay descriptors:
- Local directories: We can read relay descriptors from the cached-*
files of a local Tor data directory or from the output directory of the
directory-archive script or metrics-db. Some of these local directories
can grow quite large, so that we'll need an efficient way to exclude
descriptors that we already know. Also, some files contained in these
directories may contain multiple relay descriptors while others don't.
We'll want to support an arbitrary number of local directories in the new
API.
- Directory authorities/mirrors: We can download relay descriptors from
the directory authorities or directory mirrors via Tor's directory
protocol. We should restrict downloads to the minimum and only download
missing descriptors. We should also download compressed descriptors if
possible. In some cases we're interested whether a directory authority
serves a descriptor (e.g., consensus-health script). In most cases we
want to set a timeout for downloading descriptors.
We should design the new API in a way that it's stateless with respect to
different executions and that it doesn't have its own configuration. A
tool that uses the API should first initialize the API by creating relay
descriptor data sources and then requesting descriptors to process.
The following tools may use the new API once it's ready: metrics-db, the
part of metrics-web that aggregates statistics, the ExoneraTor database,
the relay search database, the consensus-health script, the descriptor-
health script, and the basic monitoring infrastructure.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/4439>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list