[tor-scaling] Analyzing the Predictive Capability of Tor Metrics
Dennis Jackson
djackson at mozilla.com
Thu Jul 4 23:39:39 UTC 2019
Hi all,
I've been doing some data analysis this week, using both the torperf
dataset and a more recent, higher resolution dataset from Arthur Edelstein.
Tomorrow, I plan to write it up and I'll email tor-scaling with a writeup
and links to scripts / compressed datasets so everyone can explore.
However, I wanted to share these graphs early, as I think they answer the
2015 question at least.
*Graph 1:
<https://send.firefox.com/download/b8b8fe16a6fde395/#Kc2zzIcu3cx04G7TVzcTdA>
Latency heat map by measurement server. (One dot is one measurement, only
exit node circuits)*
*Graph 2:
<https://send.firefox.com/download/8b04e417b9093a9f/#X3UI9wcQpzn3YQQp46tP7w>
As before, zoomed in to 12 months around Jan 2015. *
So I think siv's ISP change might have had more impact than previously
thought, as neither of the other two measurement servers show any real
delta.
DDOS Attacks also show up pretty clearly on Graph 1 and there's some
strange discrete banding in the early days.
More to follow tomorrow!
Best,
Dennis
On Thu, Jul 4, 2019 at 7:08 AM George Kadianakis <desnacked at riseup.net>
>> wrote:
>>
>>> Mike Perry <mikeperry at torproject.org> writes:
>>>
>>> > At Mozilla All Hands, we hoped to find a correlation between the amount
>>> > of load on the Tor network and its historical performance.
>>> >
>>> > Unfortunately, while there did appear to be periods of time where this
>>> > correlation held, we discovered a major historical discontinuity in
>>> this
>>> > correlation. We have some guesses that we need to investigate:
>>> >
>>> https://lists.torproject.org/pipermail/tor-scaling/2019-July/000053.html
>>> >
>>>
>>> You mean the "start of 2015" artifact right? It would be nice to see
>>> some more zoomed-in graphs. Like did the change happen over a single
>>> day? Is the R code for these graphs somewhere online?
>>>
>>> I'd like to add "changes to bw auth code, nodes or bandwidth weights" as
>>> another possible guess. e.g. I think that's when maatuska got shut down
>>> according to this graph:
>>> https://metrics.torproject.org/totalcw.html?start=2014-12-20&end=2015-03-10
>>>
>>> I also tried to check the onion service traffic during those days and I
>>> noticed that we introduced those graphs almost exactly those days. Could
>>> there have been some change in the metrics infrastructure those days?
>>>
>>> https://metrics.torproject.org/hidserv-rend-relayed-cells.html?start=2014-04-05&end=2015-07-04
>>>
>>> > So, how can we tell what factors actually really contribute to the
>>> > performance of the Tor network? Let's use statistics.
>>> >
>>> > Let's start of calling Tor performance our dependent variable.
>>> >
>>>
>>> By "Tor performance" here you mean "latency" and "throughput" which does
>>> not take into account "reliability". I think as a separate investigation
>>> here it would be interesting to see how the below "independent
>>> variables" impact timeout and failure graphs like this one:
>>>
>>> https://metrics.torproject.org/torperf-failures.html?start=2012-04-05&end=2019-07-04&server=public&filesize=50kb
>>>
>>> > Based on the brainstorming at Mozilla, and in the meeting on Friday, we
>>> > have a few candidate independent variables that influence performance:
>>> > 1. Total Utilization
>>> > 2. Bottleneck Utilization (Exit or Guard, whichever is scarce)
>>> > 3. Total Capacity
>>> > 4. Exit Capacity
>>> > 5. Load Balancing
>>> >
>>>
>>> I think capacity and utilization based metrics are a big part of the
>>> equation here, but they assume that Tor is a perfect byte-pushing
>>> network of pipes. Seeing how these pipes get chosen (load balancing/path
>>> selection) and how well they get used (scheduler and other
>>> implementation details like bugs) also seems important..
>>>
>>> The first four variables here seem well defined but what is "Load
>>> balancing"? How do we define this in a way that is robust and rankable?
>>>
>>> Perhaps one way could be to play with the utilization concept again but
>>> go per-relay this time, and see how well utilized individual relays are
>>> over time. How do the utilization level differ between slow and fast
>>> relays? What about different relay types?
>>>
>>> ---
>>>
>>> Interesting stuff all around! We indeed have tons of data from our
>>> network over more than a decade. We should learn to put more of those
>>> into good use.
>>> _______________________________________________
>>> tor-scaling mailing list
>>> tor-scaling at lists.torproject.org
>>> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-scaling
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-scaling/attachments/20190704/bdfa639e/attachment.html>
More information about the tor-scaling
mailing list