[tor-bugs] #7358 [Tor]: Decide on list of stats to collect
Tor Bug Tracker & Wiki
blackhole at torproject.org
Thu Nov 8 20:00:16 UTC 2012
#7358: Decide on list of stats to collect
------------------------------------------------------------------------+---
Reporter: robgjansen | Owner:
Type: task | Status: new
Priority: normal | Milestone: Tor: 0.2.4.x-final
Component: Tor | Version:
Keywords: performance, simulation, statistics, tor-relay, tor-client | Parent: #7357
Points: | Actualpoints:
------------------------------------------------------------------------+---
Old description:
> Here are the statistics I speculate will be useful, and may or may not
> already be available in some form, and may only be available externally
> (outside of the Tor code). Keep in mind that some of this is not intended
> to be collected outside of an experimentation environment, else proper
> aggregation/scrubbing is required.
>
> Client
> * download statistics (how long to get to first, last, ? byte)
> * circuit build times, build timeouts
> * which relays were chosen for each circuit, and during which time
> intervals
>
> client+relay
> * cell statistics: # queued, processed, waiting times
> * total number of or connections, circuits, and streams over time
> * various throughputs (stream, circuit, connection) over various
> intervals (last second, 10 seconds, 60 seconds, 300 seconds, ? seconds)
> * when steams, circuits, or connections change active/inactive status
> * indications of congestion (inferred by how fast/often token buckets
> were emptied/empty, queuing times from above)
>
> relay
> * protocol overheads (raw client data vs protocol traffic)
>
> What am I missing?
New description:
Here are the statistics I speculate will be useful, and may or may not
already be available in some form, and may only be available externally
(outside of the Tor code). Keep in mind that some of this is not intended
to be collected outside of an experimentation environment, else proper
aggregation/scrubbing is required.
client
* circuit build times, build timeouts
* which relays were chosen for each circuit, and during which time
intervals
* number of streams over time
* stream throughput over time
* how long streams have been active/inactive
* number of and bandwidth expended by client directory operations
client+relay
* cell statistics: number queued and processed, waiting times
* total number of circuits and the various connection types (AP, OR,
EXIT, DIR) over time
* throughput of circuits and the various connection types over time
* when steams, circuits, or connections change active/inactive status
* how fast/often token buckets were emptied/empty
*
relay
* protocol overheads (raw client data vs protocol traffic)
* number of and bandwidth expended by directory server operations
What am I missing?
--
Comment(by robgjansen):
Replying to [comment:2 karsten]:
> Replying to [ticket:7358 robgjansen]:
> > Client
> > * download statistics (how long to get to first, last, ? byte)
> > * circuit build times, build timeouts
> > * which relays were chosen for each circuit, and during which time
intervals
>
> This is basically what Torperf does. The first item is what trivsocks-
client tells us, and the second and third items are what we learn from
control port events. Implementing the first item in Tor using control
port events is sure tempting. We lack some information, e.g., the exact
timestamp when Torperf started the download and the expected number of
bytes we want to download. But maybe we can compensate for that. For
example, we could collect times between sending the first byte and
receiving 1B, 1kB, 2kB, 5kB, 10kB, 20kB, 50kB, 100kB, 200kB, 500kB, and so
on, up to the last byte. Not exactly the same as the deciles we have so
far, but should be fine for most purposes. Also, we'll have to do that
for all circuits that the client opens on behalf of the user.
I'm of the opinion that we let the user application (TorPerf, etc.)
continue measuring end-user-specific performance characteristics,
precisely because of the lack of information and imprecision that you
mentioned. Tor should stick to things its good at measuring, like
throughput.
>
> Another item for the client section here might be directory client
operations. We might want to keep track how many directory requests a
client has made and how many bytes it has sent or received for that. Then
we can compare different directory designs more easily.
>
> Also, I think we need to move all statistics from client+relay that have
to do with streams to the client-only section.
Great! I've updated the description.
> > client+relay
> > * cell statistics: # queued, processed, waiting times
> > * total number of or connections, circuits, and streams over time
>
> What about other connections than OR?
Good point :) Updated.
> > * various throughputs (stream, circuit, connection) over various
intervals (last second, 10 seconds, 60 seconds, 300 seconds, ? seconds)
> > * when steams, circuits, or connections change active/inactive
status
> > * indications of congestion (inferred by how fast/often token
buckets were emptied/empty, queuing times from above)
>
> I assume these will all be implemented as asynchronous control port
events, right? Which of them will be emitted whenever there's a change,
and which will be emitted periodically?
>
> Another item might be statistics on crypto operations as described in
#7134, but without the aggregation step that isn't necessary if we collect
these statistics in a simulation/testing environment. The two can
probably share a lot of code.
This does seem important. My vision of a "stats" module in #7359 should
help avoid duplication of code and help separate statistics from
functionality.
> > relay
> > * protocol overheads (raw client data vs protocol traffic)
>
> We also have statistics on bi-directional connection usage already in
Tor. But these are probably contained somewhere in the client+relay
section.
>
> And we might add statistics on directory server operations with the same
reasoning as adding directory client operations.
Updated.
> > What am I missing?
>
> Not sure, I guess we'll find out while working off this list. Count me
in, this is fun stuff. :)
Awesome:)
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/7358#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list