[metrics-bugs] #29787 [Metrics/Onionperf]: Enumerate possible failure cases and include failure information in .tpf output
Tor Bug Tracker & Wiki
blackhole at torproject.org
Fri Apr 5 15:31:18 UTC 2019
#29787: Enumerate possible failure cases and include failure information in .tpf
output
-------------------------------+------------------------------
Reporter: karsten | Owner: metrics-team
Type: enhancement | Status: new
Priority: Medium | Milestone:
Component: Metrics/Onionperf | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-------------------------------+------------------------------
Comment (by karsten):
Hi acute!
Your idea to extend that code that matches tgen logs and tor control port
event logs sounds interesting. Is that going to replace OnionPerf's
analysis.py? If yes, why don't you extend or replace that code rather than
start a new code base?
However, I wonder if we could start simpler here by simply looking at the
tgen logs alone:
1. For an initial classification of failure cases it might be sufficient
to learn ''when'' a request fails and ''how''. Like, in which request
phase does a request fail and how much time has elapsed up to that point?
Maybe the tgen logs also tell us how a request failed, that is, whether
the tor process sent an error or tgen ran into a timeout or stallout (even
though we're setting stallout high enough that this is currently not the
case) or checksum error or whatever. It would be good to know what
fraction of requests succeeded and what fractions failed at the various
request stages. This is all based on tgen information, which is the
application point of view that treats tor as a black box.
2. The next step after that, for me, would be to match tgen logs with tor
control port event logs. I wonder why we'd be using the source port for
this. Is that to handle potentially overlapping requests? Do we handle
cases where a source port is re-used over the day, by including time? And
what do we do if no corresponding source port is found in the other log
file, or is that scenario unrealistic/impossible? In short, this sounds
complicated and potentially error-prone. Maybe we could simplify this by
doing the matching solely based on timing information? And do you think we
could also match tor logs (not control port events) by using the same
timing information? Assuming that there's anything interesting in these
logs.
Sadly, the weekend is almost here and I likely won't be able to spend much
time on this analysis over the weekend. But if I find time, I'll start by
reading tgen logs and writing little helper tools to classify failure
cases solely based on tgen logs. I'll share measurement identifiers of
some sort for failure cases as I find them.
Thanks!
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29787#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list