[tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances
Tor Bug Tracker & Wiki
blackhole at torproject.org
Thu Sep 15 12:49:48 UTC 2016
#18910: distributing descriptors accross CollecTor instances
-------------------------------+-----------------------------------
Reporter: iwakeh | Owner: iwakeh
Type: enhancement | Status: needs_information
Priority: High | Milestone: CollecTor 1.1.0
Component: Metrics/CollecTor | Version:
Severity: Normal | Resolution:
Keywords: ctip | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-------------------------------+-----------------------------------
Comment (by karsten):
Hmm, the suggested config options would imply that there's only one new
sync manager module that syncs all descriptors from the various sources
and that runs, say, once per hour? I wonder how to schedule that in a way
that it does not interfere with the other modules. So far, modules were
pretty much independent, but this new module would create a dependency
between modules.
Alternative suggestion: we add four (sets of) configurations, one for each
module, that internally re-use the same code for syncing descriptors and
for importing them. For example, `SyncRelayDescriptors`,
`SyncBridgeDescriptors`, `SyncExitLists`, and `SyncTorperfFiles`. We
could then provide a remote path where to find descriptor files (like
`/recent/relay-descriptors/`) and could implictly only consider descriptor
types that the respective module understands (like
`RelayServerDescriptor`, `RelayExtraInfoDescriptor`, etc., but not
`BridgeServerDescriptor`).
(If we're worried that there are too many config options already, I'm more
than happy to make a list of options that can go away! But this shouldn't
mean we should hold back useful new options.)
Here's a potential policy we could apply to decided whether to keep a
local or remote descriptor: while syncing, if we find out that a remotely
obtained descriptor would be stored under a file name that already exists
locally, we always discard that; and while processing descriptors locally,
if we find that we already have a file locally with different content,
which we likely received while syncing, we always overwrite that. This
means that we're only adding data but never replacing data.
Regarding deleting synced descriptors, we should never do that, but we
should rather let `DescriptorCollector` clean up the local directory when
it finds that a local file does not exist anymore remotely.
Here's something else to watch out for while writing this code: whenever
we learn descriptors from syncing, we'll have to include them in our
`/recent/` directory, too. This wasn't entirely clear to me from the
description above, so if this was already the plan, never mind.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18910#comment:14>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list