[tor-bugs] #33076 [Metrics/Analysis]: Graph onionperf and consensus information from Rob's experiments
Tor Bug Tracker & Wiki
blackhole at torproject.org
Thu Feb 13 12:34:08 UTC 2020
#33076: Graph onionperf and consensus information from Rob's experiments
-------------------------------------------------+-------------------------
Reporter: mikeperry | Owner:
| metrics-team
Type: task | Status:
| needs_review
Priority: Medium | Milestone:
Component: Metrics/Analysis | Version:
Severity: Normal | Resolution:
Keywords: metrics-team-roadmap-2020Q1, sbws- | Actual Points: 3
roadmap |
Parent ID: #33121 | Points: 6
Reviewer: | Sponsor:
-------------------------------------------------+-------------------------
Comment (by dennis.jackson):
Replying to [comment:24 karsten]:
== 24 Hour Moving Average
> I like your percentiles graph with the moving 24 hour window. We should
include that graph type in our candidate list for graphs to be added to
OnionPerf's visualization mode. Is that moving 24 hour window a standard
visualization, or did you further process the data I gave you?
At a high level: I'm loading the data into Pandas and then using the
`rolling` function to compute statistics for a window. It's pretty
flexible supports different weighting strategies for the window, but I
used 'uniform' here. The code is contained in the python notebook I linked
at the end of my post.
Excerpt:
{{{
time_period = 60*60*24
threshold = 10
p95 = lambda x :
x.rolling(f'{time_period}s',min_periods=threshold).dl.quantile(0.95)
}}}
The resulting data can be plotted as a time series in your graphing
library of choice :).
== Measuring Latency
> Regarding the dataset behind bandwidth measurements, I wonder if we
should kill the 50 KiB downloads in deployed OnionPerfs and only keep the
1 MiB and 5 MiB downloads. If we later think that we need time-to-50KiB,
we can always obtain that from the tgen logs. The main change would be
that OnionPerfs consume more bandwidth and also put more load on the Tor
network. The effect for graphs like these would be that we'd have 5 times
as many measurements.
I think that is definitely worth thinking about as 50 KB does seem too
small to infer anything about bandwidth. It is maybe worth considering the
cost of circuit construction though. For example, if we open a circuit for
latency measurement, we could use Arthur's strategy of fetching HEAD only
and maybe it is worth using that circuit for a series of measurements over
a couple of minutes which would give us more reliable "point in time" data
without any additional circuit construction overhead.
== August Measurement Success Rate
> But I think (and hope) that you're wrong about measurements not having
finished. If DATAPERC100 is non-null that actually means that the
measurement reached the point where it received 100% of expected bytes.
See also the [https://metrics.torproject.org/collector.html#type-torperf
Torperf and OnionPerf Measurement Results data format description].
You are quite right! I looked back at my code and whilst I was correctly
checking DATAPERC100 is non-null to imply success, I also found a trailing
`}` which captured my check in the wrong `if` clause. My bad! Rerunning
with the fix shows only 29 measurements failed to finish in August. Much
much healthier!
== Number of Measurements in August
> Are you sure about that 10k ttfb measurements number for the month of
August? In theory, every OnionPerf instance should make a new measurement
every 5 minutes. That's 12*24*31 = 8928 measurements per instance in
August, or 8928*4 = 35712 measurements performed by all four instances in
August. So, okay, not quite 10k, but also not that many more. We should
spin up more OnionPerf instances as soon as it has become easier to
operate them.
Sorry, this was sloppy and incorrect wording on my part: "month of August"
-> "Experimental period from August 4th - August 19th". There are 15k
attempted measurements in this window, however op-hk did not achieve any
successful connections and consequently only ~10k successful measurements
in the dataset.
== How many is enough?
> What's a good number to keep running continuously, in your opinion? 10?
20? And maybe we should consider deploying more than 1 instance per host
or data center, so that we have more measurements with comparable network
properties.
I think it would be worth pulling Mike (congestion related) and the
network health team (#33178) in and thinking about this in terms of output
statistics rather than measurements input. Possible Example:
* For a given X `{minute,hour,day}` period, we want to measure for `{any
circuit, circuits using this guard, circuits using this exit}`,
`{probability of time out, p5-p50-p95 latency, p5-p50-95 bandwidth}` with
a 90% confidence interval less than `{1%, 500ms, 500 KB/s}`
This gives us a rolling target in terms of measurements we want to make,
varying on network conditions and how fine grained we would like the
statistics to be for a given time period. We could estimate the number of
samples required (using the existing datasets) for each of these
statistics, put in the cost per measurement and work out what is feasible
for long term monitoring and short term experiments.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/33076#comment:25>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list