[metrics-bugs] #22983 [Metrics/metrics-lib]: add a descriptor interface and implementation for web-logs
Tor Bug Tracker & Wiki
blackhole at torproject.org
Fri Jul 28 09:08:26 UTC 2017
#22983: add a descriptor interface and implementation for web-logs
---------------------------------+-----------------------------------
Reporter: iwakeh | Owner: metrics-team
Type: enhancement | Status: needs_review
Priority: Medium | Milestone: metrics-lib 2.1.0
Component: Metrics/metrics-lib | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
---------------------------------+-----------------------------------
Comment (by iwakeh):
Replying to [comment:10 karsten]:
> I need more time for this review. But here's a first question:
>
> Should we really put the sanitizing code into metrics-lib rather than
CollecTor? That's an important design decision and a change to what we
have been doing in the past. Where would this code be used other than in
CollecTor? So far, metrics-lib has primarily been the client-side library
for applications using CollecTor data. But this change would turn it into
a library that both the CollecTor server and its clients depend on. I'm
yet undecided whether this is a good idea or not. In any case, we should
discuss this first.
The sanitizing code **is not** part of metrics-lib. Thus, we agree here.
In the proposed patch metrics-lib enables adding sanitizing code from the
'outside' using method
{{{
public void setSanitizer(LogDescriptor.Sanitizer sani);
}}}
and `Sanitizer` is just a functional interface (i.e., having one method)
that can be fulfilled by a lambda expression once we go to Java8, but
that's an aside.
The given `Sanitizer` is applied when `sanitize()` is called. The
resulting lines are sorted by metrics-lib.
A choice I made is to have a default identity sanitizer in case none was
set instead of raising an exception.
With this approach metrics-lib is sanitizer-code-agnostic, but provides
all else (compression, de-compression, etc.), which avoids duplicating
code and enables us to implement performance and space saving code 'under
the hood' once it is needed. Hope this explains my reasoning.
CollecTor depends on metrics-lib already as it uses `Descriptor`s of all
kinds as well as parser and reader from metrics-lib.
(I noticed that I missed adding a comment to
`LogDescriptor.setSantizer()`. I'll add a fixup commit, but that
shouldn't hinder review.)
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22983#comment:11>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list