[metrics-bugs] #22983 [Metrics/metrics-lib]: add a descriptor interface and implementation for web-logs

Tor Bug Tracker & Wiki blackhole at torproject.org
Fri Jul 28 09:08:26 UTC 2017


#22983: add a descriptor interface and implementation for web-logs
---------------------------------+-----------------------------------
 Reporter:  iwakeh               |          Owner:  metrics-team
     Type:  enhancement          |         Status:  needs_review
 Priority:  Medium               |      Milestone:  metrics-lib 2.1.0
Component:  Metrics/metrics-lib  |        Version:
 Severity:  Normal               |     Resolution:
 Keywords:                       |  Actual Points:
Parent ID:                       |         Points:
 Reviewer:                       |        Sponsor:
---------------------------------+-----------------------------------

Comment (by iwakeh):

 Replying to [comment:10 karsten]:
 > I need more time for this review. But here's a first question:
 >
 > Should we really put the sanitizing code into metrics-lib rather than
 CollecTor? That's an important design decision and a change to what we
 have been doing in the past. Where would this code be used other than in
 CollecTor? So far, metrics-lib has primarily been the client-side library
 for applications using CollecTor data. But this change would turn it into
 a library that both the CollecTor server and its clients depend on. I'm
 yet undecided whether this is a good idea or not. In any case, we should
 discuss this first.

 The sanitizing code **is not** part of metrics-lib.  Thus, we agree here.
 In the proposed patch metrics-lib enables adding sanitizing code from the
 'outside' using method
 {{{
 public void setSanitizer(LogDescriptor.Sanitizer sani);
 }}}
 and `Sanitizer` is just a functional interface (i.e., having one method)
 that can be fulfilled by a lambda expression once we go to Java8, but
 that's an aside.

 The given `Sanitizer` is applied when `sanitize()` is called.  The
 resulting lines are sorted by metrics-lib.

 A choice I made is to have a default identity sanitizer in case none was
 set instead of raising an exception.

 With this approach metrics-lib is sanitizer-code-agnostic, but provides
 all else (compression, de-compression, etc.), which avoids duplicating
 code and enables us to implement performance and space saving code 'under
 the hood' once it is needed.  Hope this explains my reasoning.

 CollecTor depends on metrics-lib already as it uses `Descriptor`s of all
 kinds as well as parser and reader from metrics-lib.

 (I noticed that I missed adding a comment to
 `LogDescriptor.setSantizer()`.  I'll add a fixup commit, but that
 shouldn't hinder review.)

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22983#comment:11>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list