[tor-scaling] Analyzing the Predictive Capability of Tor Metrics

Thu Jul 4 18:03:51 UTC 2019

> On Jul 3, 2019, at 1:56 AM, Roger Dingledine <arma at torproject.org> wrote:
> 
> Speaking of this one, I encouraged Rob to do an experiment where he
> puts load through each relay for a 20-second burst. The goal would
> be to learn how much extra capacity the relays "discover" this way:
> is it 5%? 50%? 500%? Is it evenly distributed?
> 
> It will help him with the load balancing paper he's working on, it will
> give us better intuition about how far off our self-measured relay
> capacities are, and it will have minimal impact on the network to do
> the experiment once or twice.

I am still planning on writing up an experiment plan, submitting it to the Tor research safety board, and sharing it (probably with tor-relays@) before running the experiment.

> On Jul 4, 2019, at 10:08 AM, George Kadianakis <desnacked at riseup.net> wrote:
> 
> Perhaps one way could be to play with the utilization concept again but
> go per-relay this time, and see how well utilized individual relays are
> over time. How do the utilization level differ between slow and fast
> relays? What about different relay types?

Please keep in mind as this analysis is conducted that we cannot rely on the reported observed bandwidth (the highest 10 second average relay goodput over each 24 hour period) to be accurate, particularly in periods of low network utilization. As a relay is used less often, it will observe fewer bursts and it is less likely that the observed bandwidth will have reached the true capacity of the relay.

I believe this is particularly problematic for fast relays, where things like self bandwidth tests and torperf measurements are not likely to fully utilize the relay.

The experiment that Roger described and that I plan to run will help us get a better handle on this issue.

Peace, love, and positivity,
Rob