[tor-dev] Get Stem and zoossh to talk to each other

Sun Aug 16 21:44:40 UTC 2015

>> > Ideally, zoossh should do the heavy lifting as it's implemented in a
>> > compiled language.
>>
>> This is assuming zoossh is dramatically faster than Stem by virtue of being
>> compiled. I know we've discussed this before but I forget the results - with
>> the latest tip of Stem (ie, with lazy loading) how do they compare? I'd expect
>> time to be mostly bound by disk IO, so little to no difference.
>
> zoossh's test framework says that it takes 36364357 nanoseconds to
> lazily parse a consensus that is cached in memory (to eliminate the I/O
> bottleneck).  That amounts to approximately 27 consensuses a second.
>
> I used the following simple Python script to get a similar number for
> Stem:
>
>     with open(file_name) as consensus_file:
>         for router in stem.descriptor.parse_file(consensus_file,
>                 'network-status-consensus-3 1.0',
>                 document_handler = stem.descriptor.DocumentHandler.ENTRIES):
>             pass
>
> This script manages to parse 24 consensus files in ~13 seconds, which
> amounts to 1.8 consensuses a second.  Let me know if there's a more
> efficient way to do this in Stem.

Interesting! First thought is 'wonder if zoossh is even reading the
file content'. Couple quick things to try are...

with open(file_name) as consensus_file:
  consensus_file.read()

... to see how much time is disk IO verses parsing. Second is to try
doing something practical (say, count the number of relays with the
exit flag). Stem does some bytes => unicode normalization which might
account for some difference but other than that I'm at a loss for what
would be taking the time.

Cheers! -Damian