[tor-bugs] #7358 [Tor]: Decide on list of stats to collect

Tor Bug Tracker & Wiki blackhole at torproject.org
Thu Nov 8 20:00:16 UTC 2012


#7358: Decide on list of stats to collect
------------------------------------------------------------------------+---
 Reporter:  robgjansen                                                  |          Owner:                    
     Type:  task                                                        |         Status:  new               
 Priority:  normal                                                      |      Milestone:  Tor: 0.2.4.x-final
Component:  Tor                                                         |        Version:                    
 Keywords:  performance, simulation, statistics, tor-relay, tor-client  |         Parent:  #7357             
   Points:                                                              |   Actualpoints:                    
------------------------------------------------------------------------+---

Old description:

> Here are the statistics I speculate will be useful, and may or may not
> already be available in some form, and may only be available externally
> (outside of the Tor code). Keep in mind that some of this is not intended
> to be collected outside of an experimentation environment, else proper
> aggregation/scrubbing is required.
>
> Client
>    * download statistics (how long to get to first, last, ? byte)
>    * circuit build times, build timeouts
>    * which relays were chosen for each circuit, and during which time
> intervals
>
> client+relay
>    * cell statistics: # queued, processed, waiting times
>    * total number of or connections, circuits, and streams over time
>    * various throughputs (stream, circuit, connection) over various
> intervals (last second, 10 seconds, 60 seconds, 300 seconds, ? seconds)
>    * when steams, circuits, or connections change active/inactive status
>    * indications of congestion (inferred by how fast/often token buckets
> were emptied/empty, queuing times from above)
>
> relay
>    * protocol overheads (raw client data vs protocol traffic)
>
> What am I missing?

New description:

 Here are the statistics I speculate will be useful, and may or may not
 already be available in some form, and may only be available externally
 (outside of the Tor code). Keep in mind that some of this is not intended
 to be collected outside of an experimentation environment, else proper
 aggregation/scrubbing is required.

 client
    * circuit build times, build timeouts
    * which relays were chosen for each circuit, and during which time
 intervals
    * number of streams over time
    * stream throughput over time
    * how long streams have been active/inactive
    * number of and bandwidth expended by client directory operations

 client+relay
    * cell statistics: number queued and processed, waiting times
    * total number of circuits and the various connection types (AP, OR,
 EXIT, DIR) over time
    * throughput of circuits and the various connection types over time
    * when steams, circuits, or connections change active/inactive status
    * how fast/often token buckets were emptied/empty
    *

 relay
    * protocol overheads (raw client data vs protocol traffic)
    * number of and bandwidth expended by directory server operations

 What am I missing?

--

Comment(by robgjansen):

 Replying to [comment:2 karsten]:
 > Replying to [ticket:7358 robgjansen]:
 > > Client
 > >    * download statistics (how long to get to first, last, ? byte)
 > >    * circuit build times, build timeouts
 > >    * which relays were chosen for each circuit, and during which time
 intervals
 >
 > This is basically what Torperf does.  The first item is what trivsocks-
 client tells us, and the second and third items are what we learn from
 control port events.  Implementing the first item in Tor using control
 port events is sure tempting.  We lack some information, e.g., the exact
 timestamp when Torperf started the download and the expected number of
 bytes we want to download.  But maybe we can compensate for that.  For
 example, we could collect times between sending the first byte and
 receiving 1B, 1kB, 2kB, 5kB, 10kB, 20kB, 50kB, 100kB, 200kB, 500kB, and so
 on, up to the last byte.  Not exactly the same as the deciles we have so
 far, but should be fine for most purposes.  Also, we'll have to do that
 for all circuits that the client opens on behalf of the user.

 I'm of the opinion that we let the user application (TorPerf, etc.)
 continue measuring end-user-specific performance characteristics,
 precisely because of the lack of information and imprecision that you
 mentioned. Tor should stick to things its good at measuring, like
 throughput.

 >
 > Another item for the client section here might be directory client
 operations.  We might want to keep track how many directory requests a
 client has made and how many bytes it has sent or received for that.  Then
 we can compare different directory designs more easily.
 >
 > Also, I think we need to move all statistics from client+relay that have
 to do with streams to the client-only section.
 Great! I've updated the description.
 > > client+relay
 > >    * cell statistics: # queued, processed, waiting times
 > >    * total number of or connections, circuits, and streams over time
 >
 > What about other connections than OR?
 Good point :) Updated.
 > >    * various throughputs (stream, circuit, connection) over various
 intervals (last second, 10 seconds, 60 seconds, 300 seconds, ? seconds)
 > >    * when steams, circuits, or connections change active/inactive
 status
 > >    * indications of congestion (inferred by how fast/often token
 buckets were emptied/empty, queuing times from above)
 >
 > I assume these will all be implemented as asynchronous control port
 events, right?  Which of them will be emitted whenever there's a change,
 and which will be emitted periodically?
 >
 > Another item might be statistics on crypto operations as described in
 #7134, but without the aggregation step that isn't necessary if we collect
 these statistics in a simulation/testing environment.  The two can
 probably share a lot of code.
 This does seem important. My vision of a "stats" module in #7359 should
 help avoid duplication of code and help separate statistics from
 functionality.
 > > relay
 > >    * protocol overheads (raw client data vs protocol traffic)
 >
 > We also have statistics on bi-directional connection usage already in
 Tor.  But these are probably contained somewhere in the client+relay
 section.
 >
 > And we might add statistics on directory server operations with the same
 reasoning as adding directory client operations.
 Updated.
 > > What am I missing?
 >
 > Not sure, I guess we'll find out while working off this list.  Count me
 in, this is fun stuff. :)

 Awesome:)

-- 
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/7358#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online


More information about the tor-bugs mailing list