[metrics-bugs] #29772 [Metrics/Website]: Plot nearly worst-case bandwidth when downloading from [public|onion] server
Tor Bug Tracker & Wiki
blackhole at torproject.org
Thu Apr 25 14:01:23 UTC 2019
#29772: Plot nearly worst-case bandwidth when downloading from [public|onion]
server
-----------------------------+------------------------------
Reporter: karsten | Owner: metrics-team
Type: enhancement | Status: needs_review
Priority: Medium | Milestone:
Component: Metrics/Website | Version:
Severity: Normal | Resolution:
Keywords: scalability | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-----------------------------+------------------------------
Changes (by karsten):
* status: needs_revision => needs_review
Comment:
I should start this comment by saying that I'm not a statistician. In case
of doubt what I'm saying below, please go re-read this first sentence! :)
I agree with you that the bandwidth plot works better than the latency
plot. We're excluding very few bandwidth numbers as outliers as compared
to the number of latency numbers that we're throwing out.
However, I don't think that a 4-day moving average would fix this. As you
can see in the boxplots I posted here last week, medians and quartiles are
relatively stable over the days, and those values are what we're using to
figure out if another value is excluded as outlier. After all, we have
around 144 latency values per day and public/onion service. So, even if we
considered 4 days (or even more) at a time, our threshold for excluding
values as outliers would not change much. Of course, implementing such a
moving average wouldn't be trivial to do, with all the missing data that
we have to handle.
I think the issue is that the way we're excluding outliers is based on the
assumption that our data is normally distributed. This works okay for
bandwidth, which is obviously not 100% correct, because there's no
negative bandwidth, but which is apparently close enough. It doesn't work
very well for latencies, because there's some heavy-tailed distribution at
work that we don't know, and not all the values we're excluding are really
outliers.
Another reason could be that we're looking at the smallest bandwidth
values, which are at the ''head'' of the distribution, and at the largest
latency values, which are the heavy ''tail''.
However, my suggestion is to ignore all this and make the plots as you
suggested earlier and as I plotted them last week. Two reasons:
1. Boxplots are understood by many people, and if we say that we're
plotting the five values from boxplots, people will understand what we're
doing.
2. We need a baseline, even if it's not 100% correct in a
mathematical/statistical sense. If our way to exclude outliers is flawed,
it will be flawed for past measurements as well as for future
measurements, in the exact same way.
Regarding your rocket analogy: it's certainly not just distance between
relays that we're seeing here. We're also seeing overfull queues keeping
received cells waiting for crypto and forwarding to the next relay. But
this is fine, we want to know how long it takes to send something over the
circuit and get back a response.
So, my suggestion would be to move forward with what we have. What do you
think?
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29772#comment:7>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list