[tor-bugs] #22141 [Metrics/metrics-lib]: Deprecate `DescriptorFile` and add relevant information to `Descriptor`
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed Jun 14 09:42:09 UTC 2017
#22141: Deprecate `DescriptorFile` and add relevant information to `Descriptor`
---------------------------------+-----------------------------------
Reporter: karsten | Owner: karsten
Type: enhancement | Status: needs_review
Priority: Medium | Milestone: metrics-lib 1.9.0
Component: Metrics/metrics-lib | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
---------------------------------+-----------------------------------
Comment (by iwakeh):
Replying to [comment:13 karsten]:
> I'm afraid I found two flaws in the design above and the suggestion on
#22196:
>
> 1. We're using the `Iterable<Descriptor>` and `InvalidDescriptor` as
one possible returned element as channel for all kinds of exceptions
thrown while reading descriptor files. But we don't have such a channel
in `DescriptorCollector#collectDescriptors` which returns, well, `void`.
That means that the design above doesn't solve anything of
[https://trac.torproject.org/projects/tor/ticket/16225#comment:7 this
issue raised on the related ticket #16225]. And ideally we'd handle
exceptions in the same way in all descriptor sources.
`DescriptorCollectorImpl` is deprecated:
{{{
* @deprecated Replaced by {@link DescriptorIndexCollector} which uses the
* remote instance's index.json file as a more robust alternative to
parsing
* the remote instance's directory listings.
}}}
Using the `DescriptorIndexCollector` implementation no descriptors are
parsed during download.
What did I overlook?
> 1. The idea of replacing the parse history with a `minLastModified`
timestamp doesn't handle I/O problems very well. For example, in the
current implementation, if there's a network problem with downloading a
potentially large file from CollecTor, we would not include that in the
parse history and retry collecting and reading it next time. But with
only a timestamp we would simply skip that descriptor file, which is
pretty bad.
>
> New plan:
> - We rename `InvalidDescriptor` to `UnparseableDescriptor` and let it
return the `DescriptorParseException` that made it unparseable in our
view. This instance ''might'' be useful to the application, at least by
containing the raw descriptor `byte[]` or descriptor `File` reference.
And if the application produced the input descriptor itself, like
CollecTor, knowing about it being unparseable is more important than for a
consumer, like Onionoo!
I assumed that InvalidDescriptor would provide the bytes and other
descriptor meta-data, the renaming makes sense.
> - We leave the parse history in place and postpone the idea of
stateless, overloaded methods. Sad.
Not so sad, actually. From an API point of view hiding away the history
bookkeeping reduces complexity.
> - We add a method `removeFromHistoryFile(File)` that can be called by
the application if it wants to reprocess a descriptor file containing an
`UnparseableDescriptor`. This would not be necessary for I/O exceptions,
because those descriptor files would not go into the parse history and
retried in the next execution anyway.
> - We log any exceptions on `warn` level that we caught while collecting
or reading or parsing descriptors. The application can't ''do'' anything
to handle these issues anyway, except for telling the operator that
something went wrong. `DescriptorParseException`s thrown while parsing
invalid/unparseable descriptors are exempt from this and will only be
logged on info level.
I think INFO logging to the regular log could bloat logs unnecessarily;
more below.
> - We ''could'' introduce an `ExceptionListener` for exceptions are than
`DescriptorParseException`, or the "ParseExceptionsLog" that you mentioned
(even though I'm not entirely certain what that would be). I don't think
it's necessary though, because we wouldn't it use that ourselves, and we
don't know whether it would be used by anyone.
I think an exception listener is not really necessary. The 'logging
channel' suggestion was meant as introducing another Logger as for example
is used in Onionoo for statistics:
`LoggerFactory.getLogger("statistics")...`.
Such a `LoggerFactory.getLogger("ParseExceptionsLog")` for all parsing
problems would 'channel' these log statements and the final receiver (be
it log or /dev/null) can be changed via runtime configuration.
Does that make sense?
>
> Before I write more code (or longer comments!), what do you think? :)
I didn't look at the branch below, yet. Should I now?
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22141#comment:15>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list