[tor-dev] Stem Proc Integration Tests
Damian Johnson
atagar at torproject.org
Fri Jun 29 16:27:34 UTC 2012
> Keep in mind that metrics tarballs can be huge. stem's tests probably
> shouldn't download one or more of these tarballs in an automatic integ
> test run.
Oops yup. Should have mentioned that. We're just picking out a
descriptor that seems to exercise most of the parsing. This is just
for a sanity check that 'we can still parse something found in the
wild'. Megan, Erik: the layout should be pretty obvious when you take
a peek in test/integ/descriptor/data/*.
> The Java metrics-lib doesn't
> understand microdescriptor consensuses, because they don't contain
> anything new for statistical analysis, but I think stem will want to
> parse them.
Definitely. Microdescriptors are available via the control protocol so
we need to be able to parse them.
> It probably makes sense to have an abstract
> NetworkStatusEntry class that does most of the parsing work but that can
> be specialized in its subclasses. Picking names like ConsensusEntry if
> the consensus class is called Consensus makes sense.
Perfect, thanks. Megan, Erik: if I was in your shoes the first thing
that I'd do to approach this is propose the following on this list...
- an object hierarchy (we already have a bit of one, ex.
ServerDescriptor vs RelayDescriptor/BridgeDescriptor)
- a description for each of the classes, preferably something meaty
that we can use for the pydocs of each class with the :var: entries
- your thoughts on which parsing logic should go where (look at the
previous descriptor classes for a pattern that you might want to
follow)
> If there's a
> similar concept to Java's inner classes in Python, maybe using something
> like Consensus.Entry might be a good choice, too, because this class
> will only be used as part of a Consensus.
Yup, there is.
>>> class Foo:
... class Bar:
... def __init__(self):
... self.my_value = 5
... def __init__(self):
... self.my_bar = Foo.Bar()
...
>>> f = Foo()
>>> f.my_bar.my_value
5
> A related question: can you give us a couple of use-cases for the export functionality? E.g., is filtering (we only want fields X, Y, and Z when Q = ...) likely to be of use? Anything beyond just a straight dump of descriptor/network status/etc entries?
I'll mostly leave this question for Fabio since the csv dumping
functionality was his idea, though my thoughts on some use cases
are...
- user writes a script that has stem parse the descriptors, filter the
results (say, down to Syrian exit relays), then dumps to a csv so they
can make pretty graphs or do other analysis of the data
- user has a python script that hourly parses their cached descriptors
to get any new exits that only allow plaintext traffic, then dump just
the fingerprint and ip to a csv so they can later be scanned for
malicious activity
> Please use the built-in function vars() instead of __dict__ to retrive
> instance attributes.
Ah ha, thanks.
More information about the tor-dev
mailing list