[metrics-bugs] #29315 [Metrics/Website]: Write down guidelines for adding new stats
Tor Bug Tracker & Wiki
blackhole at torproject.org
Thu Apr 25 10:32:14 UTC 2019
#29315: Write down guidelines for adding new stats
-------------------------------------+--------------------------------
Reporter: karsten | Owner: karsten
Type: enhancement | Status: needs_revision
Priority: Very High | Milestone:
Component: Metrics/Website | Version:
Severity: Normal | Resolution:
Keywords: metrics-roadmap-2019-q2 | Actual Points:
Parent ID: | Points: 3
Reviewer: irl | Sponsor:
-------------------------------------+--------------------------------
Comment (by karsten):
Replying to [comment:14 irl]:
> I would like for these systems to be as open/transparent as is possible.
The demarcation between a system that collects metrics and Tor Metrics
should not just be for Tor Metrics. Anyone should be able to do what Tor
Metrics does. This means that services publish data, and we pull from the
service.
This sounds like a fine recommendation where this is possible. If a system
can sanitize its data by itself before making it available to us and
others, great! Let's just be clear that we're shifting complexity and
maintenance work from Tor Metrics to services run by others. If they have
the resources to do this, okay.
But let's consider whether we want to make this a hard requirement. There
may be services where we're glad that somebody runs them and where we
cannot expect them to also run sanitizing code. The options in such a case
are that we either don't get the data, or we sanitize it somewhere. And if
we can choose where to sanitize it, we can either do it as part of a
CollecTor module or in a separate tool run on the host that also runs the
service. In either case we're providing the sanitized data to others who
can then do everything that Tor Metrics does.
However, we discussed this topic before, and it seems we still do not
quite agree. Would it help if we made this a hard requirement with the
caveat that, if somebody cannot run sanitizing code, ''we'' run it on a
machine that is not officially part of Tor Metrics?
> It does not need to be a web server. If there is not already a webserver
then a Gopher server or TCP port that dumps out the document are also fine
as far as I'm concerned, maybe karsten has other opinions.
Gopher? My initial reaction is that we shouldn't fall into the same
esoterism trap where we also lost Haskell-written TorDNSEL.
I'd say let's strongly recommend a webserver, and if that's not possible,
talk to folks.
> Increasingly I'm thinking that the Tor directory protocol meta format is
a good format to have metrics in. We already have parsers for these that
are fast and efficient, and it's easier to detect errors due to the strict
format (even if #30105 and similar things sometimes slip through). The
document format also provides for signing of documents, which I'd like to
see more of our data sources doing. #29624 is looking at defining a new
format for exit lists, and is using the meta format with Ed25519
signature.
Sounds good to me, as a recommendation that likely works for most new
formats. For example, having sanitized web server logs in the Apache
format made sense, because then it was possible to use existing tools to
process them. But yes, for most formats this is a fine recommendation.
Would you mind taking the draft and the comments above and writing an
updated draft? I feel like if I continue owning this task, we'll need more
review rounds. Let me know!
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29315#comment:15>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list