[metrics-bugs] #23421 [Metrics/CollecTor]: Use persistence functionality throughout all modules
Tor Bug Tracker & Wiki
blackhole at torproject.org
Fri Nov 3 09:02:19 UTC 2017
#23421: Use persistence functionality throughout all modules
-------------------------------+-----------------------------------
Reporter: iwakeh | Owner: metrics-team
Type: enhancement | Status: needs_information
Priority: High | Milestone:
Component: Metrics/CollecTor | Version:
Severity: Normal | Resolution:
Keywords: metrics-2017 | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-------------------------------+-----------------------------------
Comment (by karsten):
Replying to [comment:1 iwakeh]:
> Here's an overview (cf. #21759 comment:12 following):
Some quick thoughts:
> * bridge-desc (all types): after sanitation the descriptor is
written; if one descriptor cannot be sanitized, it is skipped
Sounds reasonable. The decision whether to skip a descriptor or not should
remain outside of the persistence module. We could easily remove
synchronization functionality, because a CollecTor instance either
sanitizes bridge descriptors or copies them over from another instance,
but not both. But I don't mind keeping it as long as it doesn't get in the
way by making designs more complex than they have to be.
> * relay-desc (all types): descriptors written one by one skipping
problematic ones
I wonder if we should take out the part where we're skipping problematic
descriptors, so that we handle descriptors coming from directory
authorities and from other CollecTor instances the same. We only need
basic things like publication time, descriptor digest, etc. to determine
file names. But maybe it shouldn't be on us to decide about rejecting a
descriptor. Needs discussion.
> * exitlists: always stored as a single file.
Yup. Nothing special here, I think.
> * onionperf: currently implemented using an implicit transaction,
i.e., all descriptors in one downloaded descriptor file are only stored,
if all were valid. This is different from the sync-approach where
invalid/unparseable descriptors are ignored, but valid ones stored no
matter if they came in one file.
The implicit transaction thing was a mistake that we should fix. But
similar to relaydescs, we should think about accepting all measurements
containing just enough data to put them into the right directories. We can
still warn about validation errors, just like we do when downloading
relaydescs, but we'd store them anyway.
Regarding webstats: Maybe we can take a similar approach like we have for
bridgedescs where the sanitizer decides what files or lines go through and
the persistence layer just stores what it gets as long as it has the
necessary metadata. We don't really need synchronization here, for the
same reason as we don't need it for bridgedescs.
Is this the information you were looking for?
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/23421#comment:6>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list