[metrics-bugs] #28116 [Metrics/Statistics]: Split up legacy module into more maintainable parts
Tor Bug Tracker & Wiki
blackhole at torproject.org
Fri Oct 19 08:14:45 UTC 2018
#28116: Split up legacy module into more maintainable parts
------------------------------------+----------------------
Reporter: karsten | Owner: karsten
Type: enhancement | Status: assigned
Priority: Medium | Milestone:
Component: Metrics/Statistics | Version:
Severity: Normal | Keywords:
Actual Points: | Parent ID:
Points: | Reviewer:
Sponsor: |
------------------------------------+----------------------
Our legacy module is a mess. That code dates back to a time when we tried
to use a single database for all our statistics and for a service called
relay search, which was not the same service as today's relay search.
While I'm not ruling out that we can make a single-database approach work
for everything we want to do with our data, it's not going to be ''this''
database.
It's time to move away from this legacy database and take a similar
approach as we're taking for the other modules, where we only store the
relevant parts that we need for our graphs.
As of now, the legacy module provides data for the following graphs:
- In the Servers category:
1. Relays and bridges
2. Relays by relay flag
3. Relays by tor version
4. Relays by platform
- In the Traffic category:
5. Total relay bandwidth
6. Advertised and consumed bandwidth by relay flag
7. Consumed bandwidth by Exit/Guard flag combination
8. Bandwidth spent on answering directory requests
Viewed from a different perspective, these 8 graphs show 3 different
metrics:
- Relay or bridge counts in graphs 1 to 4
- Advertised bandwidths in graphs 5 and 6
- Bandwidth histories in graphs 5 to 8
I could imagine that we make the following changes to split up the legacy
module into more maintainable parts:
1. Use existing data from the ipv6servers module for graph 1 and for the
advertised bandwidth portions in graphs 5 and 6. This data already exists
with only trivial differences affecting how we're treating missing data.
We could just switch.
2. Extend the ipv6servers module to also provide data for graphs 2 to 4.
This extension would require us to reimport the entire archive, so it's
more of a rewrite. But the ipv6servers module code is much cleaner and
easier to extend than the legacy module code. And when we extend that
module, we can relatively easily add bridge statistics and other relay
metrics like consensus weight or path selection probabilities that we can
use in new graphs later on. All in all not a trivial amount of work, but
probably worth it.
3. Keep the remaining parts of the legacy module for the bandwidth
history parts in graphs 5 to 8. Bandwidth histories are going to be
replaced by PrivCount data in the medium term anyway. We could keep the
legacy module around for another year or two without planning to change
much during that time. And when we shut it down, we can keep a copy of the
aggregate data around, just like we're going to keep a static summary of
the Tor Messenger statistics (#26047).
I'll start working on the second suggested change above. The two other
changes depend on whether that second change can be made successfully.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/28116>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list