[tor-bugs] #9380 [BridgeDB]: BridgeDB should use Stem for parsing descriptors
Tor Bug Tracker & Wiki
blackhole at torproject.org
Sat Mar 21 05:36:04 UTC 2015
#9380: BridgeDB should use Stem for parsing descriptors
-------------------------+-------------------------------------------------
Reporter: sysrqb | Owner: isis
Type: | Status: closed
enhancement | Milestone:
Priority: blocker | Version:
Component: | Keywords: stem, bridgedb-0.3.0, bridgedb-
BridgeDB | parsers, isis2014Q3Q4, isisExB
Resolution: fixed | Parent ID:
Actual Points: |
Points: |
-------------------------+-------------------------------------------------
Changes (by isis):
* status: needs_review => closed
* resolution: => fixed
Comment:
Replying to [comment:40 atagar]:
> Hi Isis, here's some quick feedback...
>
Thanks, atagar!
> {{{
> from stem.descriptor import extrainfo_descriptor
> from stem.descriptor import server_descriptor
> }}}
>
> Unused.
Removed. Thanks!
> {{{
> def parseServerDescriptorsFile(filename, validate=True):
> """Parse a file which contains ``@type bridge-server-descriptor``s.
>
> .. note:: ``validate`` defaults to ``False`` because there appears
to be a
> bug in Leekspin, the fake descriptor generator, where Stem
thinks the
> fingerprint doesn't match the key…
>
> .. note:: We have to lie to Stem, pretending that these are ``@type
> server-descriptor``s, **not** ``@type bridge-server-
descriptor``s.
> See ticket `#11257`_.
>
> .. _`#11257`: https://trac.torproject.org/projects/tor/ticket/11257
>
> :param str filename: The file to parse descriptors from.
> :param bool validate: Whether or not to validate descriptor
> contents. (default: ``False``)
> :rtype: list
> :returns: A list of
> :api:`stem.descriptor.server_descriptor.RelayDescriptor`s.
> """
> logging.info("Parsing server descriptors with Stem: %s" % filename)
> descriptorType = 'server-descriptor 1.0'
> document = parse_file(filename, descriptorType, validate=validate)
> routers = list(document)
> return routers
> }}}
>
> This whole function is just 25 lines for doing...
>
> {{{
> routers = list(parse_file(filename, 'server-descriptor 1.0', validate =
validate))
> }}}
>
> Really doesn't seem like it's providing any value.
Yep… but adding a log statement in there makes it satisfy #9377. And
documentation is important (rather than just importing a magickal function
from Stem without explaining what is happening). And someday, with #12506,
BridgeDB's database manager is going to do this parsing, so having
everything all neatly contained within the `bridgedb.parse.descriptors`
module will come it handy when it comes time to separate (rather than just
calling `list(parse_file([…]))` in `bridgedb.Main.load()`.
> {{{
> lib/bridgedb/configure.py:
>
> def loadConfig(configFile=None, configCls=None):
> """Load configuration settings on top of the current settings.
> ...
> }}}
>
> You might want to take a look at...
>
> https://stem.torproject.org/api/util/conf.html
>
> This is a module used by stem and arm so they can have nice config
files...
>
> https://gitweb.torproject.org/stem.git/tree/test/settings.cfg
> https://gitweb.torproject.org/arm.git/tree/armrc.sample
>
> It provides type inferencing, defaults, etc without doing exec() as you
presently are (... which is almost never a great idea). You can then use
config_dict() to specify the config values each file cares about and their
default values. For example...
>
> https://gitweb.torproject.org/arm.git/tree/arm/header_panel.py#n27
>
> Then when the config for that singleton is read all the configurations
get the updated values.
Awesome! Yes, BridgeDB had a particularly nasty manner of handling config
files when I started working on it. Not to mention that it made the
servers reset themselves to whatever settings the config file had when
they were first started (#9277). I didn't change the config parsing much,
I just hacked in the parsing for updating configs while running, and
(de)serialising them.
If you or someone else ever feels like hacking on making BridgeDB's
configs saner… I'd be pretty happy about it. Otherwise I'll likely check
out your Stem config utility at some point.
> {{{
> document = parse_file(filename, descriptorType, validate=validate)
>
> try:
> for router in document:
> descriptors.append(router)
> except (ValueError, ProtocolError) as error:
> logging.error(
> ("Stem exception while parsing extrainfo descriptor from "
> "file '%s':\n%s") % (filename, str(error)))
> _copyUnparseableDescriptorFile(filename)
> }}}
>
> No reason for the for loop, same as...
>
> {{{
> descriptors = list(parse_file(filename, descriptorType,
validate=validate))
> }}}
I recall that that didn't work for catching the unparseable descriptors
and saving them to separate files and/or that it broke my integration
tests? I'll have to look at that more.
> {{{
> def parseNetworkStatusFile(filename, validate=True,
skipAnnotations=True,
> descriptorClass=RouterStatusEntryV3):
> """Parse a file which contains an ``@type bridge-networkstatus``
document.
>
> See `ticket #12254 <https://bugs.torproject.org/12254>`__ for why
> networkstatus-bridges documents don't look anything like the
networkstatus
> v2 documents that they are purported to look like. They are missing
all
> headers, and the entire footer including authority signatures.
> }}}
>
> Actually, you could probably simply parse it like a normal network
status document but without validation.
Validation seems like a good idea?
Someday I'll just fix the BridgeAuthority. Or kill it, since it basically
does little more than collecting everything that comes its way and then
shoving an everything.tar.gz at BridgeDB.
---
I've included fixes for some of your suggestions in the same
`fix/9380-stem_r10` branch.
Thanks, atagar!
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/9380#comment:42>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list