[metrics-bugs] #22983 [Metrics/metrics-lib]: add a descriptor interface and implementation for web-logs
Tor Bug Tracker & Wiki
blackhole at torproject.org
Fri Jul 28 20:39:20 UTC 2017
#22983: add a descriptor interface and implementation for web-logs
---------------------------------+-----------------------------------
Reporter: iwakeh | Owner: metrics-team
Type: enhancement | Status: needs_revision
Priority: Medium | Milestone: metrics-lib 2.1.0
Component: Metrics/metrics-lib | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
---------------------------------+-----------------------------------
Changes (by karsten):
* status: needs_review => needs_revision
Comment:
Hmm, no, as of today I'm not convinced that this is a good idea. It might
be and I'm not seeing it yet, or how it would work for other sanitized
descriptors. But I'd rather merge something simple in the next couple of
days and not rush this somewhat major design change. I agree that there's
no actual sanitizing code in metrics-lib. But except for this new
interface, there's also no other interface in metrics-lib where one can
plug in sanitizing code. Avoiding duplicate code is good, but keeping
interfaces small and intuitive is good, too. Let's open a ticket for that
and do it in a separate step when we have more time to think about it.
Here's some more feedback on the `LogDescriptor` interface:
- The documentation of `TYPE` says that the name should include a string.
But who should make sure that this is the case? The application developer?
Can you rephrase that to say what is expected here?
- Some of the method descriptions are written in 3rd person ("Returns
..."), some in 2nd person ("Return ..."). I believe that 3rd person is
preferred, though we're not doing that consistently in the rest of
metrics-lib. But we should at least try to do it consistently in one
interface.
- "yyymmdd" -> "yyyymmdd" in `getLogDate()`
- I'm unclear what to expect as return value from `getLogType()`. What
types exist? Do I get a class name, or what?
- Maybe rename `getLogMillis()` to `getLogDateMillis()` to indicate that
we're just returning the milliseconds of the date at 00:00:00 UTC, not the
milliseconds of whatever time of the day the log was produced.
- Please avoid abbreviations like "msec" and instead write "milliseconds
since the epoch", for consistency.
- The JavaDoc for `getRawDescriptorBytes()` should not copy the known
compression types, but refer to `getCompressionType()` for the list. Right
now, there's already an inconsistency between the list regarding `bz2`.
- Are uncompressed logs supported? The documentation for
`getRawDescriptorBytes()` doesn't indicate that, nor does
`getCompressionType()` say what it would return in that case.
- Shouldn't `getRawDescriptorBytes()` return the uncompressed bytes and a
separate method `getCompressedBytes()` return the compressed bytes?
Thinking of being consistent with other descriptors where
`getRawDescriptorBytes()` returns uncompressed bytes, too. Not sure about
this one.
- "added in future" -> "added in the future"
- We briefly discussed above to include the first, say, 100 unrecognized
lines in `getUnrecognizedLines()`, but the documentation says it doesn't
reply any. Why? Because it's not implemented yet?
- As explained above, let's take out the `Sanitizer` subinterface and
related methods.
I'd like to wait for your response and a revised branch before doing
another review of the remaining code. I'm not around most of Saturday but
can take a look after that. Or on Monday, if you'd rather enjoy your
weekend. :) Thanks!
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22983#comment:12>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list