[metrics-bugs] #22983 [Metrics/metrics-lib]: add a descriptor interface and implementation for web-logs
Tor Bug Tracker & Wiki
blackhole at torproject.org
Thu Jul 20 19:34:48 UTC 2017
#22983: add a descriptor interface and implementation for web-logs
---------------------------------+------------------------------
Reporter: iwakeh | Owner: metrics-team
Type: enhancement | Status: new
Priority: Medium | Milestone:
Component: Metrics/metrics-lib | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
---------------------------------+------------------------------
Comment (by karsten):
Regarding the name, let's try to find something more descriptive. How
about `WebServerLog` or even `ApacheHttpServerAccessLog`? Otherwise
there's the risk of confusion with descriptor types added in the future,
like a log file written by BridgeDB containing client requests for bridge
addresses.
Regarding the suggested interface, I think there's a short term and a long
term part here.
In the long term I think that it would be at least twice as useful if we
read the log contents and added methods to read these parsed contents.
It's true that this causes some development hassle. But that's why we do
it once in the library rather than rely on possibly more than one
application to get it right. And we can still include the raw descriptor
bytes by storing the compressed bytes and inflate them upon request.
Some comments on the interface:
- Let's include a subtype `Request` or similar for each line contained in
the log file, and let's include a method `getRequests()` that returns
`Iterable<Request>`.
- Due to the fact that we cannot include a `@type` annotation with a
version number, `Request` should ideally include getters for all fields
contained in Apache's Combined Log Format.
- Ideally, `getLogDate()` would return the date in milliseconds since the
epoch to be conformant to the rest of metrics-lib, in which case it would
probably be called `getLogMillis()`.
- I'm unclear what `getCompressionType()` returns. I think I'd expect a
`String` that is either `"gz"` or `"gz"`, but not a `byte[]`. Was that
intended?
- If we read and parse logs, we'll have to change
`getUnrecognizedLines()` to return any unrecognized lines.
In the short term I can see how we might want to put the `Request` part on
hold and only return metadata and uncompressed raw descriptor contents in
this new descriptor type.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22983#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list