[tor-bugs] #12676 [Metrics Data Processor]: Bridge descriptors CollecTor's recent/ directory contain many duplicates
Tor Bug Tracker & Wiki
blackhole at torproject.org
Tue Jul 22 07:53:49 UTC 2014
#12676: Bridge descriptors CollecTor's recent/ directory contain many duplicates
------------------------------------+---------------------
Reporter: karsten | Owner:
Type: defect | Status: new
Priority: minor | Milestone:
Component: Metrics Data Processor | Version:
Keywords: | Actual Points:
Parent ID: | Points:
------------------------------------+---------------------
The `recent/` directory should only contain new descriptors, and ideally
no duplicates. I just found that the latter is not the case:
{{{
$ grep -c "@type" recent/bridge-descriptors/server-
descriptors/2014-07-22-07-04-02-server-descriptors
18175
$ grep -c "@type" recent/bridge-descriptors/extra-
infos/2014-07-22-07-04-02-extra-infos
9723
}}}
Compare this to relay descriptors:
{{{
$ grep -c "@type" recent/relay-descriptors/server-
descriptors/2014-07-22-07-05-52-server-descriptors
931
$ grep -c "@type" recent/relay-descriptors/extra-infos/2014-07-22-07-05-52
-extra-infos
930
$ grep -c "@type" recent/relay-
descriptors/microdescs/micro/2014-07-22-07-05-52-micro
30
}}}
The reason is that only novel relay descriptors will be downloaded and
stored to disk, but the parsed bridge descriptor tarballs are full
snapshots of Tonga's cached descriptor files. We need to add a check
whether we already have a sanitized bridge descriptor and only store it if
not.
Priority is minor, because this only adds some additional load on clients
parsing descriptors more than once. But other than that it's mostly
harmless.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/12676>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list