[metrics-bugs] #27076 [Metrics/CollecTor]: Reconfigure collector2.tp.o to do less
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed Aug 8 08:48:06 UTC 2018
#27076: Reconfigure collector2.tp.o to do less
-----------------------------------+--------------------------
Reporter: karsten | Owner: metrics-team
Type: task | Status: new
Priority: Medium | Milestone:
Component: Metrics/CollecTor | Version:
Severity: Normal | Keywords:
Actual Points: | Parent ID:
Points: | Reviewer:
Sponsor: |
-----------------------------------+--------------------------
We have two CollecTor instances: collector.tp.o on colchicifolium and
collector2.tp.o on corsicum. Reasons for having two instances instead of
one are related to failure tolerance:
1. Whenever collector.tp.o fails, it doesn't fetch consensuses and votes
from the directory authorities, and those are only available for an hour.
If collector.tp.o fails for a couple hours, it can later fetch missing
descriptors from collector2.tp.o.
2. While collector.tp.o is down, Onionoo can fetch relay descriptors from
collector2.tp.o and continue to provide recent data.
However, I think we went a bit too far when configuring collector2.tp.o to
also sync descriptors from collector.tp.o. It does that with bridge
descriptors and sanitized web logs.
Here's how the two instances are currently configured:
{{{
collector.tp.o/colchicifolium:
RelaySources = Cache, Remote, Sync, Local
BridgeSources = Local
ExitlistSources = Remote
OnionPerfSources = Remote
WebstatsSources = Local
collector2.tp.o/corsicum:
RelaySources = Remote
BridgeSources = Sync
ExitlistSources = Remote
OnionPerfSources = Remote
WebstatsSources = Sync
}}}
It's the two `"Sync"` entries at the bottom. I think we mainly put them in
so that the respective sync code gets executed, too, so that we would
notice any issues with that.
I now believe that these entries are not helpful and potentially harmful,
for several reasons:
1. The sync mode of the bridgedescs module does not clean up the
`recent/` directory after placing descriptors there. The local mode would
do that, but the sync mode does not. The effect is that bridge descriptors
in `recent/` pile up and fill up disk space. Even worse, Onionoo fetches
everything contained in that directory, so that bootstrapping a new
Onionoo instance downloads vast amounts of data these days.
2. I don't yet know what happened in #27055, but it seems that
simplifying the configuration of collector2.tp.o should make that issue at
least less likely to happen again.
I could imagine reconfiguring collector2.tp.o to only perform the
following tasks:
{{{
collector2.tp.o/corsicum:
RelaySources = Remote
ExitlistSources = Remote
}}}
The effect would be that we'd still keep our failure tolerance properties
and nothing more.
Does that make sense? Did I miss anything important here?
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/27076>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list