[tor-dev] Making microdescriptor tarballs available on metrics.tpo
Karsten Loesing
karsten at torproject.org
Wed Jan 22 08:01:37 UTC 2014
On 1/22/14 4:32 AM, Damian Johnson wrote:
>> Damian, can you try to parse these descriptors using stem, to see if the
>> descriptor annotations are correct and if stem can parse them without
>> issues?
>
> Hi Karsten, sorry about the delay! Yup, stem parses them just fine
> (though processing compressed tarballs still takes an unpleasantly
> long time)...
>
>
> % du -h microdescs-2014-01.tar.bz2
> 1.8M microdescs-2014-01.tar.bz2
>
>
> % cat parse.py
> from stem.descriptor.reader import DescriptorReader
>
> counter = 0
>
> with DescriptorReader(["microdescs-2014-01.tar.bz2"]) as reader:
> for desc in reader:
> counter += 1
>
> print "Found %i microdescriptors" % counter
>
>
> % time python parse.py
> Found 14999 microdescriptors
>
> real 67m15.022s
> user 65m50.259s
> sys 1m13.717s
Wow, that's indeed time-consuming. Inflating the tarball before feeding
it into stem probably solves this problem. (That's what I usually do
with metrics-lib, too.)
Thanks for testing this! Will deploy the metrics-db changes on yatei.
All the best,
Karsten
More information about the tor-dev
mailing list