[tor-dev] moved from [Tor-censorship-events] Improving the censorship event detector.

Thu Aug 20 15:57:55 UTC 2015

On Thu, Aug 20, 2015 at 09:09:23AM -0400, l.m wrote:
> Hi,
> 
> As some of you may be aware, the mailing list for censorship events
> was recently put on hold indefinitely. This appears to be due to the
> detector providing too much false positive in it's current
> implementation. It also raises the question of the purpose for such a
> mailing list. Who are the stakeholders? What do they gain from an
> improvement?
> 
> I've read some of the documentations about this. As far as I can tell
> at a minimum an `improvement` in the event detector would be to:
> 
> - reduce false positives
> - distinguish between tor network reachability, and tor network
> interference
> - enable/promote client participation through the submission of
> results from an ephemeral test (itself having property provably
> correct and valid)
> 
> In order to be of use to the researchers it needs greater analysis
> capability. Is it enough to say censorship is detected? By this point
> the analysis is less interesting--because the discourse which itself
> lead to the tor use is probably evident (or it becomes harder to
> find). On the other hand, if a researcher is aware of some emerging
> trend they may predict the censorship event by predicting the use of
> tor. This may also be of use in analysis of other events.
> 
> - should detect more than just censorship
> - accept input from researchers
> 
> From the tech reports it looks like Philipp has a plan for an
> implementation of the tests noted above. It's only the format of the
> results submission which is unknown.
> 
> - provide client test results to tor project developers
> - make decision related data available
> Regards
> --leeroy

Hi,

These are well identified issues. We've been working here on a way to
improve the current filtering detection approach, and several of the
points above are things that we're actively hoping to work into our
approach. Differentiating 'filtering' from 'other events that affect Tor
usage' is tricky, and will most likely have to rely on other
measurements from outside Tor. We're currently looking at ways to
construct models of 'normal' behaviour in a way that incorporates
multiple sources of data.

We have a paper up on arXiv that might be of interest. I'd be interested
to be in touch with anyone who's actively working on this. (We have
code, and would be very happy to work on getting it into production.)
I've shared the paper with a few people directly, but not here on the
list.  

arXiv link: http://arxiv.org/abs/1507.05819

We were looking at any anomalies, not only pure Tor-based filtering
events. For the broader analysis, significant shifts in Tor usage are
very interesting. It's therefore useful to detect a range of unusual
behaviours occurring around Tor, and have a set of criteria within that
to allow differentiating 'hard' filtering events from softer anomalies
occurring due to other factors.

Joss
-- 
Dr. Joss Wright | Research Fellow 
Oxford Internet Institute, University of Oxford
http://www.oii.ox.ac.uk/people/?id=176