[tor-bugs] #2489 [Website]: Set up new web server logging and log analysis infrastructure

Tue Feb 8 09:47:12 UTC 2011

#2489: Set up new web server logging and log analysis infrastructure
-------------------------+--------------------------------------------------
 Reporter:  karsten      |       Owner:  phobos
     Type:  enhancement  |      Status:  new   
 Priority:  normal       |   Milestone:        
Component:  Website      |     Version:        
 Keywords:               |      Points:        
   Parent:               |  
-------------------------+--------------------------------------------------

Comment(by karsten):

 I'm mostly ignoring this task, because it's on Runa's list as I hear.  A
 few thoughts anyway:

 I think for metrics we made a reasonable decision to use only the data
 that we publish.  (I'm a terrible fortuneteller, so I cannot promise we'll
 never have to break this principle.  So far we didn't, and if we'll have
 to, we'll tell the world/or-dev before.)  This decision ensures that we're
 not collecting too much data in the first place.  We may even get feedback
 from the community if we do collect too much data.  Also, we're less
 susceptible for attacks, because we don't have any secret data.  And it's
 a question of fairness towards other researchers who don't happen to run a
 Tor network and who want to do Tor research.

 The situation with web logs may be comparable to the bridge descriptor
 situation.  We cannot publish raw bridge descriptors.  Instead, we're
 spending significant resources, mostly developer time, on sanitizing
 bridge descriptors before publishing and analyzing them.  Most
 importantly, we're not analyzing the original descriptors at all.

 How about we implement a similar sanitizing process for web logs?  Maybe
 we can replace sensitive data with other data that allows us to keep track
 of user sessions without giving away any other user data.  I admit it will
 eat a lot of development resources (here: Runa).  In addition to that, we
 should look at the original logs and reduce the details that are too
 sensitive and that we're not going to use anyway.  What we should not do,
 IMHO, is using the original logs for analysis and publishing only
 sanitized logs.  I'm aware that this means we won't be able to answer all
 questions we would like to answer, simply because the data is too
 sensitive.  We should also present and discuss the complete sanitizing
 process on or-dev before implementing it.

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2489#comment:5>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online