[metrics-bugs] #22983 [Metrics/metrics-lib]: add a descriptor interface and implementation for web-logs

Tor Bug Tracker & Wiki blackhole at torproject.org
Fri Jul 21 07:56:10 UTC 2017


#22983: add a descriptor interface and implementation for web-logs
---------------------------------+------------------------------
 Reporter:  iwakeh               |          Owner:  metrics-team
     Type:  enhancement          |         Status:  new
 Priority:  Medium               |      Milestone:
Component:  Metrics/metrics-lib  |        Version:
 Severity:  Normal               |     Resolution:
 Keywords:                       |  Actual Points:
Parent ID:                       |         Points:
 Reviewer:                       |        Sponsor:
---------------------------------+------------------------------

Comment (by iwakeh):

 Replying to [comment:4 karsten]:

 Thanks for the valuable input!

 > Regarding the name, let's try to find something more descriptive. How
 about `WebServerLog` or even `ApacheHttpServerAccessLog`? Otherwise
 there's the risk of confusion with descriptor types added in the future,
 like a log file written by BridgeDB containing client requests for bridge
 addresses.

 I see an interface hierarchy here:
 LogDescriptor as parent for all logs (then we drop 'Descriptor' from the
 names) and have the first extending interface WebServerAccessLog.  Later
 we can add others *Log interfaces like BridgeDbClientLog etc.

 So for now, I focus on the access-log integration and keep future
 extensions in mind for the design.

 >
 > Regarding the suggested interface, I think there's a short term and a
 long term part here.
 >
 > In the long term I think that it would be at least twice as useful if we
 read the log contents and added methods to read these parsed contents.
 It's true that this causes some development hassle. But that's why we do
 it once in the library rather than rely on possibly more than one
 application to get it right. And we can still include the raw descriptor
 bytes by storing the compressed bytes and inflate them upon request.

 Yes, partially I have this anyway for sanitizing the logs.  I'll add
 generally useful functionality to the metrics-lib code.
 Should we have a new package for the implementations like
 `org.torproject.logs`?  The log processing and content differs from usual
 descriptors quite a bit.

 >
 > Some comments on the interface:
 >  - Let's include a subtype `Request` or similar for each line contained
 in the log file, and let's include a method `getRequests()` that returns
 `Iterable<Request>`.

 There could be a parent interface LogLine that is extended by an
 appropriate interface for each log type, like a Request interface for
 access-logs.
 I think about it and definitly keep the design open for the addition, but
 would put it on lesser priority right now.

 >  - Due to the fact that we cannot include a `@type` annotation with a
 version number, `Request` should ideally include getters for all fields
 contained in Apache's Combined Log Format.
 >  - Ideally, `getLogDate()` would return the date in milliseconds since
 the epoch to be conformant to the rest of metrics-lib, in which case it
 would probably be called `getLogMillis()`.

 Fine, but we only have the date no time here.  Thus, msec signals a
 precision we don't offer.
 I don't feel strongly about that.

 >  - I'm unclear what `getCompressionType()` returns. I think I'd expect a
 `String` that is either `"gz"` or `"gz"`, but not a `byte[]`. Was that
 intended?

 Correct, this should read `String getCompressionType()`, just a typo.
 Actually, it might turn into an enum.

 >  - If we read and parse logs, we'll have to change
 `getUnrecognizedLines()` to return any unrecognized lines.

 Yes, maybe with an upper limit in case a log got mangled?

 >
 > In the short term I can see how we might want to put the `Request` part
 on hold and only return metadata and uncompressed raw descriptor contents
 in this new descriptor type.

 Fine, as replied above.

 Do you have a rough estimate of the future log file sizes metrics-lib will
 have to deal with?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22983#comment:5>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list