[tor-commits] [stem/master] Tor descriptor lazy loading
atagar at torproject.org
atagar at torproject.org
Sun Jan 25 22:37:35 UTC 2015
commit 3dac7c51300062d78298b370b1286965652600e4
Merge: 92dd464 6484250
Author: Damian Johnson <atagar at torproject.org>
Date: Sun Jan 25 13:57:03 2015 -0800
Tor descriptor lazy loading
I've been wanting to do this for years.
When reading a descriptor we parsed every field in it. This is necessary if
we're validating it, but usually users don't care about validation and only
want an attribute or two.
When parsing without validation we now lazy load the document, meaning we
parse fields on-demand rather than everything upfront. This naturally greatly
improves our performance for reading descriptors...
Server descriptors: 27% faster
Extrainfo descriptors: 71% faster
Microdescriptors: 43% faster
Consensus: 37% faster
It comes at a small cost to our performance for when we read with validation,
but not big enough for it to be a concern. As an added benefit this actually
makes our code a lot more maintainable too!
https://trac.torproject.org/projects/tor/ticket/14011
--------------------------------------------------------------------------------
Benchmarking script
--------------------------------------------------------------------------------
import time
from stem.descriptor import parse_file
start_time, fingerprints = time.time(), []
for desc in parse_file('/home/atagar/.tor/cached-descriptors', validate = True):
fingerprints.append(desc.fingerprint)
count, runtime = len(fingerprints), time.time() - start_time
print 'read %i descriptors with validation, took %0.2f seconds (%0.5f seconds per descriptor)' % (count, runtime, runtime / count)
start_time, fingerprints = time.time(), []
for desc in parse_file('/home/atagar/.tor/cached-descriptors', validate = False):
fingerprints.append(desc.fingerprint)
count, runtime = len(fingerprints), time.time() - start_time
print 'read %i descriptors without validation, took %0.2f seconds (%0.5f seconds per descriptor)' % (count, runtime, runtime / count)
--------------------------------------------------------------------------------
Results
--------------------------------------------------------------------------------
Please keep in mind these are just the results on my system. These are, of
course, influenced by your system and background load...
Server descriptors:
before: read 6679 descriptors with validation, took 10.71 seconds (0.00160 seconds per descriptor)
before: read 6679 descriptors without validation, took 4.46 seconds (0.00067 seconds per descriptor)
after: read 6679 descriptors with validation, took 11.48 seconds (0.00172 seconds per descriptor)
after: read 6679 descriptors without validation, took 3.25 seconds (0.00049 seconds per descriptor)
Extrainfo descriptors:
before: read 6677 descriptors with validation, took 7.91 seconds (0.00119 seconds per descriptor)
before: read 6677 descriptors without validation, took 7.64 seconds (0.00114 seconds per descriptor)
after: read 6677 descriptors with validation, took 8.91 seconds (0.00133 seconds per descriptor)
after: read 6677 descriptors without validation, took 2.22 seconds (0.00033 seconds per descriptor)
Microdescriptors:
before: read 10526 descriptors with validation, took 2.41 seconds (0.00023 seconds per descriptor)
before: read 10526 descriptors without validation, took 2.34 seconds (0.00022 seconds per descriptor)
after: read 10526 descriptors with validation, took 2.74 seconds (0.00026 seconds per descriptor)
after: read 10526 descriptors without validation, took 1.34 seconds (0.00013 seconds per descriptor)
Consensus:
before: read 6688 descriptors with validation, took 2.11 seconds (0.00032 seconds per descriptor)
before: read 6688 descriptors without validation, took 2.04 seconds (0.00030 seconds per descriptor)
after: read 6688 descriptors with validation, took 2.47 seconds (0.00037 seconds per descriptor)
after: read 6688 descriptors without validation, took 1.28 seconds (0.00019 seconds per descriptor)
stem/descriptor/__init__.py | 172 ++-
stem/descriptor/extrainfo_descriptor.py | 974 +++++++--------
stem/descriptor/microdescriptor.py | 122 +-
stem/descriptor/networkstatus.py | 1279 +++++++++-----------
stem/descriptor/router_status_entry.py | 737 +++++------
stem/descriptor/server_descriptor.py | 683 +++++------
test/unit/descriptor/extrainfo_descriptor.py | 28 +-
.../networkstatus/directory_authority.py | 9 +-
test/unit/descriptor/networkstatus/document_v3.py | 36 +-
.../descriptor/networkstatus/key_certificate.py | 24 +-
test/unit/descriptor/router_status_entry.py | 13 +-
test/unit/descriptor/server_descriptor.py | 10 +-
12 files changed, 1915 insertions(+), 2172 deletions(-)
More information about the tor-commits
mailing list