[tor-bugs] #33255 [Metrics]: Review existing graphing code
Tor Bug Tracker & Wiki
blackhole at torproject.org
Tue Feb 25 11:24:11 UTC 2020
#33255: Review existing graphing code
-----------------------------------------+------------------------------
Reporter: karsten | Owner: metrics-team
Type: task | Status: needs_review
Priority: Medium | Milestone:
Component: Metrics | Version:
Severity: Normal | Resolution:
Keywords: metrics-team-roadmap-2020Q1 | Actual Points:
Parent ID: #33327 | Points: 1
Reviewer: | Sponsor: Sponsor59
-----------------------------------------+------------------------------
Changes (by karsten):
* status: new => needs_review
Comment:
Here's my review of OnionPerf commit a64b0e6, authored on 2019-10-24,
still the latest commit in master as of 2020-02-25.
== 1. External dependencies like plotting libraries
The main Python requirements for the `visualize` subcommand are scipy,
numpy, and matplotlib. The current versions as installed in my buster VM
are:
{{{
ii python-numpy 1:1.16.2-1 amd64
Numerical Python adds a fast array facility to the Python language
ii python-scipy 1.1.0-7 amd64
scientific tools for Python
ii python-matplotlib 2.2.3-6 amd64
Python based plotting system in a style similar to Matlab
}}}
== 2. Internal interdependencies with other OnionPerf code parts
Most of the visualization code is in `onionperf/visualization.py`, with a
tiny part in `onionperf/onionperf` for parsing arguments to the
`visualize` subcommand and calling code in `onionperf/visualization.py`.
== 3. User interface with possible parameters
The `visualize` subcommand has the following arguments:
{{{
$ onionperf visualize -h
usage: onionperf visualize [-h] -d PATH LABEL [-p STRING] [-f LIST]
Loads an OnionPerf json file, e.g., one produced with the `analyze`
subcommand,
and plots various interesting performance metrics to PDF files.
optional arguments:
-h, --help show this help message and exit
-d PATH LABEL, --data PATH LABEL
Append a PATH to a onionperf.analysis.json
analysis
results file, and a LABEL that we should use for
the
graph legend for this dataset (default: None)
-p STRING, --prefix STRING
a STRING filename prefix for graphs we generate
(default: None)
-f LIST, --format LIST
A comma-separated LIST of color/line format
strings to
cycle to matplotlib's plot command (see
matplotlib.pyplot.plot) (default:
k-,r-,b-,g-,c-,m-,y-
,k--,r--,b--,g--,c--,m--,y--,k:,r:,b:,g:,c:,m:,y:,k-.,
r-.,b-.,g-.,c-.,m-.,y-.)
}}}
It's worth noting that the `-d PATH LABEL` argument can be given multiple
times to plot multiple data sets as different CDFs or time series.
For example, the following command produces visualizations of measurements
performed on 2019-01-11, 2019-01-21, and 2019-01-31 as three different
data sets:
{{{
onionperf visualize \
-d 2019-01-11.onionperf.analysis.json.xz 2019-01-11 \
-d 2019-01-21.onionperf.analysis.json.xz 2019-01-21 \
-d 2019-01-31.onionperf.analysis.json.xz 2019-01-31
}}}
== 4. Input data requirements
Input data consists of one or more data sets. Each data set uses the
values from exactly one OnionPerf analysis document in the JSON format,
which typically contains 1 UTC day of measurements from a single OnionPerf
instance with different requested file sizes and server types (public, v2
onion, v3 onion).
== 5. All produced output files
The `visualize` subcommand produces 2 PDF files as output:
The first output file is called `tgen.onionperf.viz.$timestamp.pdf` and
contains:
- time to download first byte, all clients
- mean time to download first of {51200,1048576,5242880} bytes, all
clients over time
- time to download {51200,1048576,5242880} bytes, all downloads
- median time to download {51200,1048576,5242880} bytes, each client
- mean time to download {51200,1048576,5242880} bytes, each client
- max time to download {51200,1048576,5242880} bytes, each client
- mean time to download last of {51200,1048576,5242880} bytes, all
clients over time
- number of {51200,1048576,5242880} byte downloads completed, each client
- number of {51200,1048576,5242880} byte downloads completed, all clients
over time
- number of transfer {PROXY,READ} errors, each client
- number of transfer {PROXY,READ} errors, all clients over time
- bytes transferred before {PROXY,READ} error, all downloads
- median bytes transferred before {PROXY,READ} error, each client
- mean bytes transferred before {PROXY,READ} error, each client
The second output file is called `tor.onionperf.viz.$timestamp.pdf` and
contains:
- 60 second moving average throughput, read, all relays
- 1 second throughput, read, all relays
- 1 second throughput, read, each relay
- 60 second moving average throughput, write, all relays
- 1 second throughput, write, all relays
- 1 second throughput, write, each relay
== Conclusions
There are some good news:
- The plotting libraries are pretty much standard and therefore a good
basis for making more and better graphs.
- The visualization code is nicely separated from the analysis and the
measurement code in OnionPerf.
- The user interface is very simple but also extensible towards adding
more and better graphs.
- There can be multiple input data sets per visualization, which is going
to be useful.
There are also some challenges:
- Input data sets are limited to a single analysis file each. This makes
it difficult to plot several days of measurements before/during/after an
experiment. In theory, it would be possible to process several days of
logs into a single analysis document with the `analyze` subcommand with
minimal code changes. But that requires having raw tgen and Tor controller
logs around for creating a visualization, which is not very practical.
- It's also not yet possible to filter measurements in the `visualize`
subcommand. In theory, these changes could be made in the `analyze`
subcommand to only include measurements of interest in the analysis file.
But that's also not very practical. It would be easier to make the
`visualize` subcommand more powerful by filtering measurements in each or
all data sets.
- Another aspect worth noting is that current visualizations are either
based on logs from the `tgen` process or the `tor` process running at the
client. Visualizations do not combine these two data sources, nor do they
consider logs from server-side processes.
There we are. What did I miss? Setting to needs_review to hear what other
parts need (closer) review. If this covers everything to be reviewed, we
can resolve this ticket.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/33255#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list