[metrics-bugs] #29315 [Metrics/Website]: Write down guidelines for adding new stats
Tor Bug Tracker & Wiki
blackhole at torproject.org
Thu Apr 25 10:40:31 UTC 2019
#29315: Write down guidelines for adding new stats
-------------------------------------+--------------------------
Reporter: karsten | Owner: irl
Type: enhancement | Status: accepted
Priority: Very High | Milestone:
Component: Metrics/Website | Version:
Severity: Normal | Resolution:
Keywords: metrics-roadmap-2019-q2 | Actual Points:
Parent ID: | Points: 3
Reviewer: | Sponsor:
-------------------------------------+--------------------------
Changes (by irl):
* owner: karsten => irl
* status: needs_revision => accepted
* reviewer: irl =>
Comment:
Replying to [comment:15 karsten]:
> Replying to [comment:14 irl]:
> > I would like for these systems to be as open/transparent as is
possible. The demarcation between a system that collects metrics and Tor
Metrics should not just be for Tor Metrics. Anyone should be able to do
what Tor Metrics does. This means that services publish data, and we pull
from the service.
>
> This sounds like a fine recommendation where this is possible. If a
system can sanitize its data by itself before making it available to us
and others, great! Let's just be clear that we're shifting complexity and
maintenance work from Tor Metrics to services run by others. If they have
the resources to do this, okay.
>
> But let's consider whether we want to make this a hard requirement.
There may be services where we're glad that somebody runs them and where
we cannot expect them to also run sanitizing code. The options in such a
case are that we either don't get the data, or we sanitize it somewhere.
And if we can choose where to sanitize it, we can either do it as part of
a CollecTor module or in a separate tool run on the host that also runs
the service. In either case we're providing the sanitized data to others
who can then do everything that Tor Metrics does.
If we are not going to make it work any other way, it is probably better
to do the sanitizing in CollecTor than to run it on another machine as
this might split our focus and end up with us making mistakes. We could
make it a "very strong" recommendation, but then fallback to doing the
sanitizing in CollecTor as a last resort.
> > It does not need to be a web server. If there is not already a
webserver then a Gopher server or TCP port that dumps out the document are
also fine as far as I'm concerned, maybe karsten has other opinions.
>
> Gopher? My initial reaction is that we shouldn't fall into the same
esoterism trap where we also lost Haskell-written TorDNSEL.
Good point. However, what do we mean when we say "web server"? Would we
accept a server that only allows SPDY/3 for example? We should pick some
client libraries that must have support for at least one of the available
protocols.
> > Increasingly I'm thinking that the Tor directory protocol meta format
is a good format to have metrics in. We already have parsers for these
that are fast and efficient, and it's easier to detect errors due to the
strict format (even if #30105 and similar things sometimes slip through).
The document format also provides for signing of documents, which I'd like
to see more of our data sources doing. #29624 is looking at defining a new
format for exit lists, and is using the meta format with Ed25519
signature.
>
> Sounds good to me, as a recommendation that likely works for most new
formats. For example, having sanitized web server logs in the Apache
format made sense, because then it was possible to use existing tools to
process them. But yes, for most formats this is a fine recommendation.
Right, if there are already well-defined formats for certain structured
data we should reuse those.
> Would you mind taking the draft and the comments above and writing an
updated draft? I feel like if I continue owning this task, we'll need more
review rounds. Let me know!
Ok, I'll pick this up and then develop the next version of the draft.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29315#comment:16>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list