[tor-bugs] #18910 [Metrics/CollecTor]: distributing descriptors accross CollecTor instances
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed Oct 12 15:58:16 UTC 2016
#18910: distributing descriptors accross CollecTor instances
-------------------------------+---------------------------------
Reporter: iwakeh | Owner: iwakeh
Type: enhancement | Status: needs_review
Priority: High | Milestone: CollecTor 1.1.0
Component: Metrics/CollecTor | Version:
Severity: Normal | Resolution:
Keywords: ctip | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-------------------------------+---------------------------------
Comment (by karsten):
Next in line is 4efd190:
- In `SyncManager`, I think we'll have to obtain the `collectionDate`
from the corresponding `CollecTorMain` subclass somehow, so that we're
appending descriptors to the same files as previously downloaded or
locally read descriptors. Otherwise we'd have two files in the `recent/`
directory per descriptor type, making the index larger than necessary.
Let's avoid that if we can.
- As stated in an earlier comment, there are unused variables and imports
in this commit that should be removed.
- Still in `SyncManager`, method `mergeWithLocalStorage()`, it seems that
we're using the exact same path for parse history files for different
descriptors. Example: we're reading descriptor type A with history file
X, and when we're done we're reading descriptor type B with the same
history file X. The effect is that we're not skipping any B descriptors
and overwriting everything in X that has to do with A. Basically, we're
reading the entire directory of synced descriptors in every execution.
What we need is a distinct history file per descriptor type. Or we could
use a single descriptor reader regardless of descriptor type with a single
history file. By the way, this could have caused your OOM troubles.
- Your log statement "Done reading {} of type {}." is misleading. The
`readDescriptors()` method returns immediately, but that does not at all
mean that there are any descriptors in the iterator at that point. A
better message would be "Started reading ...".
- In `SyncPersistence`, method `storeDescs()`, the documentation says
"OutputDirectory" where it should say "OutputPath".
- Regarding the three `//` comments in `storeDescs()`: it's indeed
unfortunate that we need to infer the authority fingerprint of a bridge
network status from the file name. We should consider adding a new line
for that to sanitized bridge network statuses. That's entirely unrelated
to the changes here but probably worth a ticket.
- The second comment there is probably moot with microdescriptor syncing
being postponed.
- Regarding the third comment about exit lists, we're currently using the
initial download time for storing in `out/` and `recent/`. But I think
that's something to ensure in `ExitlistPersistence`, not here, so the code
here can stay unchanged.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18910#comment:33>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list