[tor-bugs] #2489 [Website]: Set up new web server logging and log analysis infrastructure
Tor Bug Tracker & Wiki
torproject-admin at torproject.org
Tue Feb 8 09:47:12 UTC 2011
#2489: Set up new web server logging and log analysis infrastructure
-------------------------+--------------------------------------------------
Reporter: karsten | Owner: phobos
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Website | Version:
Keywords: | Points:
Parent: |
-------------------------+--------------------------------------------------
Comment(by karsten):
I'm mostly ignoring this task, because it's on Runa's list as I hear. A
few thoughts anyway:
I think for metrics we made a reasonable decision to use only the data
that we publish. (I'm a terrible fortuneteller, so I cannot promise we'll
never have to break this principle. So far we didn't, and if we'll have
to, we'll tell the world/or-dev before.) This decision ensures that we're
not collecting too much data in the first place. We may even get feedback
from the community if we do collect too much data. Also, we're less
susceptible for attacks, because we don't have any secret data. And it's
a question of fairness towards other researchers who don't happen to run a
Tor network and who want to do Tor research.
The situation with web logs may be comparable to the bridge descriptor
situation. We cannot publish raw bridge descriptors. Instead, we're
spending significant resources, mostly developer time, on sanitizing
bridge descriptors before publishing and analyzing them. Most
importantly, we're not analyzing the original descriptors at all.
How about we implement a similar sanitizing process for web logs? Maybe
we can replace sensitive data with other data that allows us to keep track
of user sessions without giving away any other user data. I admit it will
eat a lot of development resources (here: Runa). In addition to that, we
should look at the original logs and reduce the details that are too
sensitive and that we're not going to use anyway. What we should not do,
IMHO, is using the original logs for analysis and publishing only
sanitized logs. I'm aware that this means we won't be able to answer all
questions we would like to answer, simply because the data is too
sensitive. We should also present and discuss the complete sanitizing
process on or-dev before implementing it.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2489#comment:5>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list