[tor-bugs] #6569 [Stem]: Stem network status document parsing
Tor Bug Tracker & Wiki
torproject-admin at torproject.org
Tue Aug 14 16:37:19 UTC 2012
#6569: Stem network status document parsing
--------------------+-------------------------------------------------------
Reporter: neena | Owner: neena
Type: task | Status: needs_revision
Priority: normal | Milestone:
Component: Stem | Version:
Keywords: | Parent:
Points: | Actualpoints:
--------------------+-------------------------------------------------------
Changes (by atagar):
* status: needs_review => needs_revision
Comment:
Bit more feedback from looking it over this morning. I'm gonna swap this
back into needs_revision until these are addressed.
> stem/descriptor/__init__.py
> # Metrics descriptor handling. These contain a single descriptor per
file.
This comment of mine is no longer accurate with this addition. Maybe this
would now be a little nicer as chained iterators...
{{{
def parse_file(path, descriptor_file):
...
filename, file_parser = os.path.basename(path), None
if filename == "cached-descriptors":
file_parser = stem.descriptor.server_descriptor.parse_file
elif filename == "cached-extrainfo":
file_parser = stem.descriptor.extrainfo_descriptor.parse_file
else:
# check if this is a metrics descriptor
metrics_header_match = re.match("^@type (\S+) (\d+).(\d+)$",
first_line)
if metrics_header_match:
desc_type, major_version, minor_version =
metrics_header_match.groups()
file_parser = lambda f: _parse_metrics_file(desc_type,
int(major_version), int(minor_version), f)
if file_parser:
for desc in file_parser(descriptor_file):
desc._set_path(path)
yield desc
return
else:
raise TypeError("Unable to determine the descriptor's type. filename:
'%s', first line: '%s'" % (filename, first_line))
def _parse_metrics_file(descriptor_type, major_version, minor_version,
descriptor_file):
# Parses descriptor files from metrics, yielding individual descriptors.
This
# throws a TypeError if the descriptor_type or version isn't recognized.
if descriptor_type == "server-descriptor" and major_version == 1:
yield
stem.descriptor.server_descriptor.RelayDescriptor(descriptor_file.read())
elif descriptor_type == "bridge-server-descriptor" and major_version ==
1:
yield
stem.descriptor.server_descriptor.BridgeDescriptor(descriptor_file.read())
... etc...
elif desc_type in ("network-status-consensus-3", "network-status-
vote-3") and major_version == 1:
# TODO: this upfront read still makes me twitch
desc =
stem.descriptor.networkstatus.NetworkStatusDocument(descriptor_file.read())
for desc in desc.router_descriptors:
yield desc
else:
raise TypeError("Unrecognized metrics descriptor, type: %s, major
version: %i, minor version: %i" % (descriptor_type, major_version,
minor_version))
}}}
> +def _peek_keyword(descriptor_file):
> ...
> + last_position = descriptor_file.tell()
> + line = descriptor_file.readline()
> + if not line: return None
Lets use _peek_line() here. We could probably replace this function
with...
{{{
def _peek_keyword(descriptor_file):
...
line = _peek_line(descriptor_file)
if line.startswith("opt "):
line = line[4:]
if line:
return line.split(' ', 1)[0]
else:
return None
}}}
> +def _read_keyword_line(keyword, descriptor_file, validate = True,
optional = False):
> ...
> + :param str keyword: keyword the line must begin with
> + :param bool optional: if the current line must begin with the given
keyword
> + :param bool validate: validation is enabled
Missing a parameter and out of order. The opt handling in this method is
more complicated than it needs to be - like _peek_keyword() lets just
strip it off of the line variable if it's there.
> + elif line.startswith("opt"):
> + # if this is something new we don't recognize
> + # ignore it and go to the next line
> + descriptor_file.readline()
> + return _read_keyword_line(keyword, descriptor_file, optional)
This doesn't look right. I think you meant "elif optional:", having this
start with 'opt' really doesn't mean anything.
Also, when checking for an 'opt' prefix please do a startswith() check for
'opt ' so you don't match against, say, our new "optometrist" keyword. :P
> +def _read_keyword_line(keyword, descriptor_file, validate = True,
optional = False):
> + """
> + Returns the rest of the line if the first keyword matches the given
keyword. If
> + it doesn't, a ValueError is raised if optional and validate are True,
if
> + not, None is returned.
> +
> + Respects the opt keyword and returns the next keyword if the first is
"opt".
>
> ... vs...
>
> +def _read_keyword_line_str(keyword, lines, validate = True, optional =
False):
> + """
> + Returns the rest of the line if the first keyword matches the given
keyword. If
> + it doesn't, a ValueError is raised if optional and validate are True,
if
> + not, None is returned.
> +
> + Respects the opt keyword and returns the next keyword if the first is
"opt".
These pydocs look really identical to me. How to these functions differ?
Can we get rid of _read_keyword_line_str()?
> +def _read_until_keywords(keywords, descriptor_file, inclusive = False,
ignore_first = False):
> ...
> + :param bool ignore_first: doesn't check if the first line read has
one of the given keywords
This new parameter seems oddly specific. The problem that you're trying to
solve is that a router status entry starts with a 'r ' line and continues
until... well, that part isn't defined in the spec. At present you're
hardcoding our check to be for...
> +_router_desc_end_kws = ["r", "bandwidth-weights", "directory-footer",
"directory-signature"]
This strikes me as being both a bit hacky and vulnerable to breakage if
tor adds a new footer value. I'd suggest asking for clarification from
Nick - this looks like a spec bug to me.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/6569#comment:6>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list