[metrics-bugs] #22983 [Metrics/metrics-lib]: add a descriptor interface and implementation for web-logs

Tor Bug Tracker & Wiki blackhole at torproject.org
Fri Jul 28 20:39:20 UTC 2017


#22983: add a descriptor interface and implementation for web-logs
---------------------------------+-----------------------------------
 Reporter:  iwakeh               |          Owner:  metrics-team
     Type:  enhancement          |         Status:  needs_revision
 Priority:  Medium               |      Milestone:  metrics-lib 2.1.0
Component:  Metrics/metrics-lib  |        Version:
 Severity:  Normal               |     Resolution:
 Keywords:                       |  Actual Points:
Parent ID:                       |         Points:
 Reviewer:                       |        Sponsor:
---------------------------------+-----------------------------------
Changes (by karsten):

 * status:  needs_review => needs_revision


Comment:

 Hmm, no, as of today I'm not convinced that this is a good idea. It might
 be and I'm not seeing it yet, or how it would work for other sanitized
 descriptors. But I'd rather merge something simple in the next couple of
 days and not rush this somewhat major design change. I agree that there's
 no actual sanitizing code in metrics-lib. But except for this new
 interface, there's also no other interface in metrics-lib where one can
 plug in sanitizing code. Avoiding duplicate code is good, but keeping
 interfaces small and intuitive is good, too. Let's open a ticket for that
 and do it in a separate step when we have more time to think about it.

 Here's some more feedback on the `LogDescriptor` interface:
  - The documentation of `TYPE` says that the name should include a string.
 But who should make sure that this is the case? The application developer?
 Can you rephrase that to say what is expected here?
  - Some of the method descriptions are written in 3rd person ("Returns
 ..."), some in 2nd person ("Return ..."). I believe that 3rd person is
 preferred, though we're not doing that consistently in the rest of
 metrics-lib. But we should at least try to do it consistently in one
 interface.
  - "yyymmdd" -> "yyyymmdd" in `getLogDate()`
  - I'm unclear what to expect as return value from `getLogType()`. What
 types exist? Do I get a class name, or what?
  - Maybe rename `getLogMillis()` to `getLogDateMillis()` to indicate that
 we're just returning the milliseconds of the date at 00:00:00 UTC, not the
 milliseconds of whatever time of the day the log was produced.
  - Please avoid abbreviations like "msec" and instead write "milliseconds
 since the epoch", for consistency.
  - The JavaDoc for `getRawDescriptorBytes()` should not copy the known
 compression types, but refer to `getCompressionType()` for the list. Right
 now, there's already an inconsistency between the list regarding `bz2`.
  - Are uncompressed logs supported? The documentation for
 `getRawDescriptorBytes()` doesn't indicate that, nor does
 `getCompressionType()` say what it would return in that case.
  - Shouldn't `getRawDescriptorBytes()` return the uncompressed bytes and a
 separate method `getCompressedBytes()` return the compressed bytes?
 Thinking of being consistent with other descriptors where
 `getRawDescriptorBytes()` returns uncompressed bytes, too. Not sure about
 this one.
  - "added in future" -> "added in the future"
  - We briefly discussed above to include the first, say, 100 unrecognized
 lines in `getUnrecognizedLines()`, but the documentation says it doesn't
 reply any. Why? Because it's not implemented yet?
  - As explained above, let's take out the `Sanitizer` subinterface and
 related methods.

 I'd like to wait for your response and a revised branch before doing
 another review of the remaining code. I'm not around most of Saturday but
 can take a look after that. Or on Monday, if you'd rather enjoy your
 weekend. :) Thanks!

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22983#comment:12>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list