[metrics-bugs] #23421 [Metrics/CollecTor]: Use persistence functionality throughout all modules
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed Nov 22 10:20:48 UTC 2017
#23421: Use persistence functionality throughout all modules
-------------------------------+-----------------------------------
Reporter: iwakeh | Owner: metrics-team
Type: enhancement | Status: needs_information
Priority: High | Milestone:
Component: Metrics/CollecTor | Version:
Severity: Normal | Resolution:
Keywords: metrics-2017 | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-------------------------------+-----------------------------------
Comment (by karsten):
Replying to [comment:9 iwakeh]:
> The thought that invalid descriptors are mainly due to CollecTor's
parsing mechanism not recognizing them as valid is a good point in favor
of storing and syncing invalid descriptors.
> There might be invalid descriptors - mangled or not complying to the
spec - but even these will be useful for analysis and troubleshooting.
> As we only sync between highly trusted instances the possibility of
maliciously malformed descriptors can be ruled out (well, if that happens
there is another bigger problem to deal with).
> So, given that syncing only takes place between trusted instances and
data loss is the main evil to prevent the sync&store-all approach is fine:
> Only during import of sensitive data descriptors that cannot be
sanitized are skipped, other than that all descriptors are stored.
Makes sense to me.
> Possible next steps (if we agree on the above):
> 1) Make webstats module use the above approach from the beginning, if it
seems easier, also immediately change the over all sync-process.
> 2) Unless the change was made for all in step one, make the entire sync-
process keep all descriptors.
> 3) Change and adapt all other CollecTor modules accordingly using
persistence classes throughout.
How about we don't enable syncing for the new webstats module at all?
Let's face it, it's not a use case we're planning to support, so why
should we write or keep the necessary code to do it?
And going even one step further (out of scope for this ticket), how about
we disable syncing for all other modules ''except'' for the relaydescs
module where we turn it into yet another data source like downloading from
the authorities or reading from cached descriptors files? We could still
keep the code in a form that we can add re-use it in other modules in the
future, but only as long as that doesn't make the overall code more
complex than it has to be. But: new ticket. Just writing this here to
discuss the general direction.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/23421#comment:10>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list