[metrics-bugs] #29624 [Metrics/Exit Scanner]: New version of exit list format
Tor Bug Tracker & Wiki
blackhole at torproject.org
Thu Mar 7 14:01:47 UTC 2019
#29624: New version of exit list format
-------------------------------------+------------------------------
Reporter: irl | Owner: karsten
Type: task | Status: needs_review
Priority: Medium | Milestone:
Component: Metrics/Exit Scanner | Version:
Severity: Normal | Resolution:
Keywords: metrics-roadmap-2019-q2 | Actual Points:
Parent ID: #29650 | Points:
Reviewer: irl | Sponsor:
-------------------------------------+------------------------------
Changes (by karsten):
* status: needs_revision => needs_review
Comment:
Replying to [comment:7 notirl]:
> We need to work on the use of words like "may". Unless Tor already has
something for this, let's refer to RFC2119.
Makes sense. However, it's been a while that I wrote specs with those
keywords, and I think I didn't get it right in all cases back then. Do you
mind going through the spec at the end and correcting keywords
accordingly?
> I don't believe we need to prefix keywords with "Scanner". Was there a
specific reason for this?
My idea was to avoid future conflicts with keywords used in exit list
entries, and in the header it matters the least to make keywords a bit
longer. I don't feel strongly, though. Mild preference for keeping the
prefix.
> dir-spec uses kebab-case for keywords, not CamelCase.
>
> For fields that are already defined in dir-spec, like "contact" we
should refer to those semantics instead of making up our own.
Hmm, should we really mix CamelCase and kebab-case in a single document? I
think I'd prefer to stay in CamelCase notation.
> As above, for date/time formats.
Hmm? I copied over the format from dir-spec. The formats should be
equivalent. Or what do you mean?
> We should be specific on our use of country codes. There are extensions
added by the databases we are using, and we also use our own extensions.
Maybe we should talk to OONI and see what they are using too so we can be
unified.
I'm not sure what to gain from defining (or linking to) a set of allowed
country codes. I consider this field mostly informational. But I don't
really mind. In any case we could move forward with completing this spec
and writing parsers, and we could later adapt the spec to define a subset
of valid two-letter country codes.
> How does the "Downloaded" keyword work with signed documents? How do you
see it being used?
Signed documents are certainly a challenge. The issue is that this keyword
is already being used: CollecTor adds it. A better choice (back then)
would have been to use an annotation for this. But I think the `Created`
keyword will supersede this keyword anyway. Still, it's there, which is
why I included it in the spec. Maybe there's a better plan?
> On point 1, this sounds OK. I am starting to think of exit lists in the
new scanner context as a derived format from the raw measurement results
in a similar way that our current torperf files are derived from onionperf
analysis results which are derived from tor/tgen logs.
>
> As an aside, the format we are deriving from will most likely be
[[https://pathspider.readthedocs.io/en/latest/using.html#data-
formats|PATHspider ndjson]]. This is not important for the spec.
Makes sense.
> On point 2, this also sounds OK. Should we specify that an exit list
should be used with a specific consensus in applications like ExoneraTor?
I think no, we should always use the latest exit list and latest consensus
to give the most up-to-date information available.
Agreed, we should leave this up to the application.
Changing back to needs_review for the open questions. Thanks!
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29624#comment:8>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list