[metrics-bugs] #29787 [Metrics/Onionperf]: Enumerate possible failure cases and include failure information in .tpf output

Tor Bug Tracker & Wiki blackhole at torproject.org
Wed Apr 24 07:12:26 UTC 2019


#29787: Enumerate possible failure cases and include failure information in .tpf
output
-------------------------------+------------------------------
 Reporter:  karsten            |          Owner:  metrics-team
     Type:  enhancement        |         Status:  new
 Priority:  Medium             |      Milestone:
Component:  Metrics/Onionperf  |        Version:
 Severity:  Normal             |     Resolution:
 Keywords:                     |  Actual Points:
Parent ID:                     |         Points:
 Reviewer:                     |        Sponsor:
-------------------------------+------------------------------

Comment (by karsten):

 Alright, I finally made some progress here!

 Last things first, I made the following plot:

 [[Image(op_errors-2019-04-24.png​, 500px)]]

 This plot uses your script with a minor extension:

 {{{
 diff --git a/op_errors.py b/op_errors.py
 index 1c8b278..7169e4d 100644
 --- a/op_errors.py
 +++ b/op_errors.py
 @@ -131,6 +131,7 @@ def main():
              #if there are no failures at all in the circuit data then the
 csv column will simply be left empty
              pass
          header = [
 +            'unix_ts_end', 'hostname_local',
              'transfer_id', 'is_error', 'error_code', 'state_failed',
              'total_seconds', 'endpoint_remote', 'total_bytes_read',
              'circuit_id', 'stream_id','buildtime_seconds',
 'failure_reason_local',
 }}}

 I fed it with all OnionPerf .json files that we have.

 Then I combined the three fields `error_code`, `failure_reason_local` (if
 present), and `failure_reason_remote` (if present, and only if
 `failure_reason_local` is present, too) into a combined error code.

 The result is that we have 11 combined error codes now, which are all in
 the graph.

 The next step will be to understand in more detail what causes these
 errors. For example:
  - `READ` is a fun one. The cases I looked at (all from op-ab) were all
 onion service cases. The server had completed sending the response, and
 all data was "in flight". Yet, some time later, the client had its
 connection closed shortly before receiving the last remaining bytes. This
 could be a bug. Still, needs closer investigation.

 acute, if you'd like to take a look, too, maybe write down which combined
 error codes you're going to look at, so that we can avoid duplicating
 effort. (Thanks for all your efforts so far!)

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/29787#comment:20>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the metrics-bugs mailing list