[metrics-bugs] #22983 [Metrics/Library]: Add a Descriptor subinterface and implementation for Tor web server logs
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed Nov 22 11:08:46 UTC 2017
#22983: Add a Descriptor subinterface and implementation for Tor web server logs
-----------------------------+-----------------------------------
Reporter: iwakeh | Owner: metrics-team
Type: enhancement | Status: needs_revision
Priority: Medium | Milestone: metrics-lib 2.2.0
Component: Metrics/Library | Version:
Severity: Normal | Resolution:
Keywords: metrics-2017 | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-----------------------------+-----------------------------------
Comment (by karsten):
Agreed with all points above except one:
When ''parsing'' sanitized log lines metrics-lib should not reject log
lines that it would discard when ''sanitizing'' original log lines.
It's not the job of the ''parser'' to ensure that its input is properly
sanitized or to do some sort of post-sanitizing. Of course it needs to
perform some basic format verifications to perform its job. But dropping
lines because the sanitizer would drop them seems out of place.
Imagine a hypothetical situation where we decide at some point that HEAD
requests are too sensitive and we take them out in the parser. However,
previously sanitized logs would still contain them, including archives
that people keep locally and that we can't update. If somebody then takes
a recent metrics-lib version to parse their data, they'd suddenly don't
get the HEAD lines anymore. That would be rather confusing.
I think sanitizing and parsing should be separate things. In this case,
discarding lines because of certain field contents should be left to the
sanitizer.
Does that mean we should provide a general-purpose log parser? Probably
not. In the parser we don't have to provide getters for fields that we
don't care about, like user-agent string. But we should be prepared to
find request methods GET, HEAD, POST, or really anything else in log lines
we're given.
Does that make sense, or am I overlooking something?
(By the way, it's a good thing that we're keeping the spec unchanged with
regard to IP addresses not starting with `0.0.0.`. I think it would have
been pretty bad to just rewrite the first three octets to `0.0.0` and keep
the fourth unchanged. Not very privacy-preserving.)
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22983#comment:50>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list