[metrics-bugs] #30216 [Metrics/Library]: Add bandwidth file parser to metrics-lib
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed Apr 24 18:47:19 UTC 2019
#30216: Add bandwidth file parser to metrics-lib
-------------------------------------------------+-------------------------
Reporter: irl | Owner: karsten
Type: enhancement | Status:
| needs_review
Priority: High | Milestone:
Component: Metrics/Library | Version:
Severity: Normal | Resolution:
Keywords: tor-bwauth,tor-dirauth,metrics- | Actual Points:
roadmap-2019-q2 |
Parent ID: #21378 | Points:
Reviewer: | Sponsor:
-------------------------------------------------+-------------------------
Changes (by karsten):
* status: needs_revision => needs_review
Comment:
Thanks for the review!
Replying to [comment:7 irl]:
> {{{
> In version 1.0.0, Header List ends when the first relay bandwidth
> is found conforming to the next section.
> }}}
>
> but
>
> {{{
> switch (spaceSeparatedLineParts.length) {
> }}}
>
> is used to determine if you've reached the end of the headers. This
should instead search the line for "bw=" as future header lines may
contain spaces.
>
> {{{
> Additional header Lines MUST NOT use any keywords specified in the
> relay measurements format.
> }}}
>
> To be safer, it should be looking for either start of line or space
preceeding bw=. In PCRE, `(^| )bw=`. This test should also be disabled if
we've already seen a version declared that expects a terminator.
Right, I thought about this, too, and you have a point that bandwidth-
file-spec does not ''explicitly'' state whether future header lines can
ever contain spaces or not. However,
[https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt#n2134 dir-
spec] ''implicitly'' rules that out.
The only place where future header lines could contain a space is between
two or more KeyValue elements, which would basically be two or more header
lines combined. But why would anybody want to introduce such lines if they
can as well have two or more separate header lines?
How about we ask the bandwidth-file-spec authors to clarify whether this
is planned in the future? If there are no such plans, that is, header
lines with 1.x versions can never have spaces, and this is stated
explicitly in the spec, I'd like to keep this simple and efficient parser
implementation. Otherwise we can add a check like you suggested, which
certainly makes the parser more complex, but which would address this
case.
> When we are parsing country lists, ISO 2-alpha *can* be more than 2
characters, and we're looking at using that functionality for describing
CDNs in an extensible way in #30196. Codes will only be longer than 2
characters if they start "OO".
Hmm, [https://gitweb.torproject.org/torspec.git/tree/bandwidth-file-
spec.txt#n125 bandwidth-file-spec] explicitly says that country codes are
exactly two characters long.
It's certainly easy to change the regex here, it's just not following the
spec.
In fact, I'm not even sure if the authors can change the spec here for 1.x
versions, as a correct parser that can parse a valid 1.4 version with
exactly two country code characters wouldn't parse a 1.5 version with more
characters. This change might require new `CountryCodeExt` and
`CountryCodeExtList` nonterminals in newly defined header lines.
> Tests and checks are OK.
Great! My suggestion would be to merge and release this code as a valid
1.0 to 1.4 parser, ask spec authors for clarifications/amendments, and
possibly release any changes in the next metrics-lib version. It would be
good to unblock further code to archive bandwidth files.
What do you think?
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/30216#comment:8>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list