[metrics-bugs] #34257 [Metrics/Onionperf]: Analyze unusual distribution of time to extend to first hop in circuit

Wed May 20 20:28:22 UTC 2020

#34257: Analyze unusual distribution of time to extend to first hop in circuit
-----------------------------------+--------------------------
     Reporter:  karsten            |      Owner:  metrics-team
         Type:  defect             |     Status:  new
     Priority:  Medium             |  Milestone:
    Component:  Metrics/Onionperf  |    Version:
     Severity:  Normal             |   Keywords:
Actual Points:                     |  Parent ID:
       Points:                     |   Reviewer:
      Sponsor:                     |
-----------------------------------+--------------------------
 I spent some time looking at OnionPerf measurements today. I found
 something that I did not expect: It seems like the time required to build
 the first hop in a circuit has a huge variance and rather unusual
 distribution. I'll attach a graph shortly that visualizes that.

 Looking at that graph, we can see a few things:
  - The three OnionPerf instances have very different performance regarding
 circuit extension. (I checked the earlier instances op-hk, op-nl, and op-
 us, and they had the same characteristics.)
  - There are huge plateaus for op-us2 and op-nl2 in the first hop graph
 where ''some'' circuits have been successfully extended and others not.
 Typically, we'd expect a distribution like op-nl2's, just pulled to the
 right. But that's not the case here. The blue line is special at around
 0.6 seconds and the red line at around 0.9 seconds. In fact, the green
 line is also a bit special at around 0.3 seconds when it almost flattens,
 only to increase linearly until it reaches 100% at around 0.6 seconds.
  - If we assume that the U.S. and Hong Kong are simply far away from many
 relays in a geographical sense, that doesn't explain why extending to the
 middle node and to the exit goes relatively fast even for those two hosts.
 Keep in mind that extending to the second hop requires a round-trip to the
 first hop, and that extending to the third hop requires a round-trip to
 the first and the second hop.

 What's going on here? What properties of these relays should we be looking
 at? I already looked at:
  - consensus weight,
  - date/time of building these circuits, and
  - whether these are just a small number of guards being reused over and
 over;
 but nothing of these explained the shape of these ECDFs. I'm going to
 attach the data file that this graph is based on, if others want to take a
 look.

 And would it make sense to try out running an OnionPerf instance on
 another set of hosts that are geographically close to our current hosts?
 Maybe it's related to how these hosts are set up, including their network?
 (Just in case that other set of hosts produces different results, we would
 still have to investigate how that affects our overall measurements of
 things like time to first byte or throughput.)

 This is relevant to Sponsor 59, because we need to make sure that our
 current measurements are going to be a solid baseline for future
 experiments. Classifying as potential defect.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/34257>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online