[metrics-bugs] #33258 [Metrics/Onionperf]: Add CSV file export of graphed data
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed May 20 22:18:44 UTC 2020
#33258: Add CSV file export of graphed data
---------------------------------------+--------------------------------
Reporter: karsten | Owner: karsten
Type: enhancement | Status: needs_review
Priority: Medium | Milestone:
Component: Metrics/Onionperf | Version:
Severity: Normal | Resolution:
Keywords: metrics-team-roadmap-2020 | Actual Points: 1
Parent ID: #33327 | Points: 3
Reviewer: acute | Sponsor: Sponsor59-must
---------------------------------------+--------------------------------
Comment (by acute):
I've reviewed this using Pandas version 0.25.3+dfsg-7 and Seaborn version
0.10.0-1. I think the code is ready to be merged, more details below.
Overall:
* The plots look significantly better!
* The code is more readable and maintainable.
* Analyses now take less time; the difference can appear small for
single files (~0.5 seconds), but becomes significant (in the order of
seconds) when analysing multiple data sets (e.g, 14s vs 11s when plotting
two days' worth of data in two data sets). This was measured with 'time'
over the same data.
Plots:
Here I tried to determine the correspondence between the old graphs and
the new ones, and compared the arrays of values produced by each of the
versions of the code.
* ECDFs "Time to download first byte" and "Time to download last of {
51200, 1048576, 5242880 } bytes" are equivalent to the old versions. The
output of values produced is the same between the versions, with the small
exception Karsten mentions above (the lowest and highest values are added
to extend the cdf lines)
* Plots "Time to download { first, last } of { 51200, 1048576, 5242880 }
bytes over time" are equivalent to "mean time to download {first, last} of
{ 51200, 1048576, 5242880 }". The output of values produced is the same
between the two versions. A scatter plot is a much better choice for this
data, there is no reason to have the dots connected.
* Plots:
- "Time to download last of { 51200, 1048576, 5242880 } bytes"
(boxplot)
- "Mean time to download last of { 51200, 1048576, 5242880 } bytes"
- "Number of downloads of { 51200, 1048576, 5242880 } bytes
completed"
These look far better than their old counterparts, and the values
correspond between the versions. I think they could do with slightly
better resolution (perhaps more ticks?) in the y axis in some cases - for
example, looking at "Mean time to download last of 1048576 bytes" in the
last pdfs attached to this ticket, I can only tell the value was above 4
for "2019-01-31-ab" and somewhere between 2 and 3 for "2019-01-30-nl". I
get to 2.8 and 4.4 immediately looking at the old version of the graph. It
would be nice to get the number at a glance without having to dig for it
in the data (perhaps this was also the reason for the "Max time to
download..." plots), but this might not scale well when plotting multiple
datasets.
* "Downloads failed over time" and "Number of downloads failed" - I
agree with the choice of replacing the other error plots in the old
versions, these are very readable and a definite improvement over the old
ones!
Other thoughts/TODOs:
* We depend on two new libraries, both of which are well supported and
documented. We should add those to our setup requirements, and update the
installation instructions in our documentation.
* We don't seem to have a style guide for python in metrics-team.
Personally, I use yapf (tool I also maintain in Debian) to format code
according to pep8 guidelines, and also pylint. Perhaps this is something
to look into/document.
* I do get the following warning with version 0.25.3 of pandas when
running the code, which suggests a change for the future:
{{{
/usr/lib/python3/dist-
packages/pandas/plotting/_matplotlib/converter.py:103: FutureWarning:
Using an implicitly registered datetime converter for a matplotlib
plotting method. The converter was registered by pandas on import. Future
versions of pandas will require you to explicitly register matplotlib
converters.
To register the converters:
>>> from pandas.plotting import register_matplotlib_converters
>>> register_matplotlib_converters()
warnings.warn(msg, FutureWarning)
}}}
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/33258#comment:17>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list