[metrics-bugs] #22983 [Metrics/metrics-lib]: add a descriptor interface and implementation for web-logs
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed Jul 26 15:38:09 UTC 2017
#22983: add a descriptor interface and implementation for web-logs
---------------------------------+------------------------------
Reporter: iwakeh | Owner: metrics-team
Type: enhancement | Status: needs_review
Priority: Medium | Milestone:
Component: Metrics/metrics-lib | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
---------------------------------+------------------------------
Changes (by iwakeh):
* status: new => needs_review
Comment:
Please review [https://gitweb.torproject.org/user/iwakeh/metrics-
lib.git/log/?h=task-22983 this branch].
The [https://gitweb.torproject.org/user/iwakeh/metrics-
lib.git/commit/?h=task-22983&id=20a9f82d06adbf960f1da8ff9853e50c5c1c5e25
first commit] adds the new interfaces and their implementations.
LogDescriptor contains all methods that will be hopefully applicable for
all log-types possible.
WebServerAccessLog is the specialization for access-logs.
LogDescriptor also offers a sub-interface:
{{{
/**
* Providing a single function for removing sensitive data from a
* given Apache Access Log log line.
*/
public interface Sanitizer {
/** Returns a cleaned log line, i.e., without possibly privacy
* sensitive values. */
public String clean(String line);
}
}}}
and a method `sanitize()`. The latter applies the cleaning procedure to
all log lines and sorts the resulting lines. The default sanitizer
returns the line w/o any changes. This setup keeps all descriptor
parsing, compression, un-compression in metrics-lib; CollecTor is not
forced to re-implement parsing functionality and only needs to provide the
log cleaning procedure. (A similar approach could be thought up for
bridge-sanitation, too.)
The [https://gitweb.torproject.org/user/iwakeh/metrics-
lib.git/commit/?h=task-22983&id=d4ece5649573f315a8c63f43e490c3594f35affd
second commit] makes `DescriptorParser` aware of the new types and avoids
implementation javadoc comment generation for the new package.
All of the code is covered by tests which are added in
[https://gitweb.torproject.org/user/iwakeh/metrics-
lib.git/commit/?h=task-22983&id=e07bca5e9429b2b93bb2cd3c0ef6911ad42ec32e
this commit]. Total coverage even improved by one percent :-)
The addition of another sub-interface `LogDescriptor.LogLine` (and the
extensions to WebServerAccessLogLine) will be part of a new ticket, which
will also provide unrecognized lines for access-logs.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22983#comment:7>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list