[tor-dev] On the visualization of OONI bridge reachability data

Mon Oct 6 21:52:26 UTC 2014

On 10/6/14, 6:28 PM, Matthew Finkel wrote:
> On Sat, Oct 04, 2014 at 06:27:22PM -0700, M. C. McGrath wrote:
>> These were a few possibilities for visualization that we came up with
>> at the OTF summit (I can send the full notes from that discussion if
>> everyone is okay with it):

Is this something that is different from what is on this pad:
https://pad.riseup.net/p/bridgereachability?

If so please do!

>> - - Timelines (by protocol, pool, country)
>> - - Pie charts for above
>> - - Timeline/graph of time it takes to block bridge from when added to
>> TBB (github parser)
> 
> Similar to the next one, I wonder if showing a map cooresponding to
> this data would also help. At t0, zero countries block the built-in
> bridges, at t1 = only China blocks, at t2 = China + Iran block, at t3 =
> China + Iran + Syria block, t4 = t3 + Turkey, etc. I'm thinking this
> would be nice in addition to the timeline which George sketched (where
> some of the time points are clickable and update the map). I don't
> actually know how difficult this is to make.
> 

I like this idea, though having both the map and the timeline will take
up quite a bit of screen real estate. I think that both of these are
useful graphs to have and linking the two into one giant one probably
does not require that amount of effort so I would go for it.

>> - - Geographic breakdown by region (if enough data points) Could be
>> similar to this map of % of internet users who use Tor by country
>> https://transparencytoolkit.org/tormap.html

[...]

> 
> But, it would also be really cool if we can create a map like this
> based on the reachability of bridges per country per protocol and
> maybe, in addition, color-code/denote how the ISPs/country are
> interfering with the connection (e.g. throttling, DNS cache
> poisoning, IP addr/port blocking).

This would indeed be very cool. A problem is that it's quite hard to
make a statement as to which protocol is working especially in cases
like China where the blocking does not happen immediately.

What we can do however is have something like bubbles over every country
that show the percentage of bridges of every category that we have
detected as "not working" in the country at that given time and if "not
working" means that "Tor cannot bootstrap to 100%", "the connection
attempt failed" or "the connection was reset".

>> - - At what point in the tor bootstrapping does it fail (may be
>> difficult to determine, especially anonymized)?
> 
> Yes, but there's already a risk to running ooni-probe (at least right
> now, hopefully this will change in time). We will eventually need
> probes running in most countries if we want a good understanding of
> what network interference is taking place and who is affected.
> 

I don't think it's an issue to publish at what point Tor bootstrap
failed as it doesn't give away any particularly personally identifiable
information. Also keep in mind that at this stage all of the
measurements are being conducted from machines that we have rented and
operated ourselves so privacy of the probe operator is not much of a
problem.

>> - - In all visualizations, compare with control (filter, line break,
>> plot alongside, etc)
>>
>> And the variables we thought would be relevant to visualizations:
>> Protocol
>> Pool
>> Country (and region)- Iran, China, Netherlands (control)
>> Time it takes to be blocked
>> Point in bootstrap where it fails
>> Classify the bridges by commercial/residential connection
>> Time we started scanning the bridge from where
>>
> 
> Maybe latency measurements per protocol? Initially, I'm thinking
> "the time is takes to download a consensus from the bridge" but
> there are many variables that may affect this. Anyone have a better
> idea?
> 
> I think this mostly covers it. The only addition can think of right
> now is comparing different control countries against each other (and
> different ISPs within the control countries). Maybe we'll find
> something interesting.
> 

I was more thinking of something like "downloading a resource of [10k,
100k, 1M] from a fixed location" so that we don't have the variable of
the consensus size and can use this as a benchmark.

What I am looking for is patterns that can be symptoms of throttling of
encrypted/tor traffic.

>> It should be relatively simple to make rough versions of a lot of
>> visualizations to see what works once we have a parser/converter that
>> will generate JSONs (or similar) from OONI output that include the
>> variables listed above.
>>
> 
> Is someone already working on this? I'm not really volunteering, merely
> curious if this is in progress. :)
> 

I have written such scripts, but have not yet published them since I
still need to finish cleaning them up.

The kind of data that they end up generating looks something like this:
http://arturo.filasto.net/vizPlayground/bridge_rearchability.csv

>> Are there any other variables that would be particularly helpful to
>> track or visualize? And are there any visualizations (listed or
>> otherwise) that anyone would find particularly helpful?

I have been playing around with this visualization here:
http://arturo.filasto.net/vizPlayground/graph.html

It is still very rough, but the concept is that every cell is a set of
measurements done on a given bridge on a certain date. More sub cells
inside of a cell mean that not only the "bridge_reachability" test was done.

What I would like to add to this graph is also another subcell that is
the control measurement results.

The idea is that by looking at this you are able to tell which bridges
are working from which countries.

~ Art.