[tor-bugs] #18967 [Metrics/Onionoo]: Add UUID to families in Onionoo
Tor Bug Tracker & Wiki
blackhole at torproject.org
Fri May 13 08:15:31 UTC 2016
#18967: Add UUID to families in Onionoo
-----------------------------+---------------------
Reporter: seansaito | Owner:
Type: enhancement | Status: new
Priority: Medium | Milestone:
Component: Metrics/Onionoo | Version:
Severity: Normal | Resolution:
Keywords: persistence, | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-----------------------------+---------------------
Comment (by karsten):
Replying to [comment:2 seansaito]:
> Hi Karsten,
>
> Apologies for the late reply. Is there a centralized data store where we
can keep the UUIDs for each fingerprint? That way only one of the servers
has to generate the UUID once.
I'd really want to avoid relying on a centralized data store, because
that's a single point of failure. Let's try to find a model where that is
not required.
> As for 1), do you think there is a problem with the larger family
subsuming the smaller family? So the smaller family's UUID would be
overwritten. For 2), based on my current scheme, there would need to be an
additional component conducting sanity checks of the UUIDs (i.e. making
sure no two families have the same UUID).
This is not an implementation issue that can be fixed by performing sanity
checks, but a conceptual one. Consider the following series of family
UUID assignments by two separate Onionoo instances, where the second
misses an update in the middle and receives it at a later time:
{{{
AB=1, CDE=2 -> ABCDE=2 -> ABC=2
AB=1, CDE=2 -> -> ABC=1 -> ABCDE=1
}}}
Both instances did the right thing according to your algorithm, yet they
come up with a different identifier for this family in the end. This is
bad.
Here's an alternative algorithm. We define the largest fingerprint ever
seen in an extended family as family identifier. If we assume that `A < B
< C < D < E`, then we'd have the following identifiers in the situation
above:
{{{
AB=B, CDE=E -> ABCDE=E -> ABC=E
AB=B, CDE=E -> -> ABC=C -> ABCDE=E
}}}
This model doesn't have the issue where the order of processing
descriptors leads to different identifiers, but it has another minor (?)
issue: there's no way to split a family. Once a relay said it's part of a
family and one member of the family says the same, they get the same
family identifier. No divorce, no annulment possible. The smaller
families will always be shown together when a user looks up their family
identifier.
Note that the implementation would ideally only rely on server
descriptors, regardless when those were referenced by a consensus. When a
relay `A` publishes a descriptor saying that it considers `B` to be part
of its family, we store that forever. And when `B` publishes a descriptor
saying that `A` is part of its family, then they're considered part of the
same family, here `B`.
How does that sound? Would you want to implement this algorithm and feed
a month of recent descriptors into it to see what identifiers you'd come
up with? This is something we should test a bit, ideally in a stand-alone
Python script or small Java program, before building it into Onionoo.
Thanks!
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18967#comment:3>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list