[tor-dev] Understanding bwauth data in Stem?

Karsten Loesing karsten at torproject.org
Tue Dec 9 08:02:56 UTC 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/12/14 00:26, Anna Kornfeld Simpson wrote:
> Thanks all for the responses!
> 
> On Fri, Nov 21, 2014 at 4:53 PM, Sebastian Hahn
> <sebastian at torproject.org> wrote:
> 
>> Hi there,
>> 
>> On 21 Nov 2014, at 23:44, Damian Johnson <atagar at torproject.org>
>> wrote:
>>>> In other words, if I sorted the descriptors by "measured"
>>>> value, what
>> would
>>>> that order mean?
>>> 
>>> I *think* that would be the ordering of 'relays who receive the
>>> most tor client traffic due to having a more highly weighted
>>> heuristic for relay selection'.
>> 
>> that would be accurate, is my understanding
>> 
> 
> Is there documentation of why this "heuristic for relay selection"
> does not correlate that well with "bandwidth" in the descriptor?
> I've attached a couple of scatter plots pulled from moria1's
> "measured" and "bandwidth" values for each descriptor a couple
> hours ago (and the plots look similar from the other bwauths).  One
> shows all values, the other shows the bottom 75% of values (sorted
> by measurements), and neither shows as much of a correlation as I
> would expect.  Are there factors other than bandwidth that 
> contribute to this "heuristic for relay selection"?

Hi Anna,

I don't have answers, but maybe ideas for further investigations:

 - Not sure if this was mentioned before, but did you take a look at
the spec?
https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt

 - Maybe try removing bandwidth values close to 10000, or just values
exactly at 10000.  IIRC, values are capped at that value.  (Removing
just those values may be more accurate than removing the top 25%.)

 - Very small bandwidth values might be the result from newly started
or restarted relays.  (Advertised) bandwidth values are "the volume of
traffic, both incoming and outgoing, that a relay is willing to
sustain, as configured by the operator and claimed to be observed from
recent data transfers."  If a relay didn't observe larger data
transfers, the reported bandwidth value will be small, but still the
(past) measurements might be large.  Maybe compare this for single
relays over time.

 - There's an interesting pattern at 1024 (?) kB/s.  Maybe there are
more at 512 kB/s and others.  Can you reduce the amount of
overplotting in the graph?  In R/ggplot2, you'd set the "alpha" value
to something smaller than 1, so that dots become somewhat transparent.
 Could be that these patterns are normal, because operators tend to
pick certain bandwidth rates more often than others.

All the best,
Karsten

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJUhqytAAoJEJd5OEYhk8hI//UH/Re5nPKAClCMc919YFxtwBsk
o5dkCvh7a3fK0G9LOakuHunxNeXpJYrNJHlhA9djYeUKDL54DfzJFytiA80pkdNV
jaw3EC00oWsS04S29fBAZVsnRRm8neR16hraL3ULgxYAgMLxUy8XOAzAlO4lHmxh
+3aROoAytSvVHgsdwFd7ltRBtG7/NrIJmOxlNGWn8QlG9UYW4QsUYrl56Ghj0alQ
3+J1FIPYNXH0BH+t1CDM1jfjm84WbUTe/WPsXn7e1pWWUOOJOFYyIF9A41KGbJOZ
HKRni9lyV1sdfRi8xrdOigZTcN6yHyW9U119kPg8x3/PEAJqmrJGRw9//PQHqdk=
=Gm4F
-----END PGP SIGNATURE-----


More information about the tor-dev mailing list