[tor-bugs] #18967 [Metrics/Onionoo]: Add UUID to families in Onionoo

Fri May 13 08:15:31 UTC 2016

#18967: Add UUID to families in Onionoo
-----------------------------+---------------------
 Reporter:  seansaito        |          Owner:
     Type:  enhancement      |         Status:  new
 Priority:  Medium           |      Milestone:
Component:  Metrics/Onionoo  |        Version:
 Severity:  Normal           |     Resolution:
 Keywords:  persistence,     |  Actual Points:
Parent ID:                   |         Points:
 Reviewer:                   |        Sponsor:
-----------------------------+---------------------

Comment (by karsten):

 Replying to [comment:2 seansaito]:
 > Hi Karsten,
 >
 > Apologies for the late reply. Is there a centralized data store where we
 can keep the UUIDs for each fingerprint? That way only one of the servers
 has to generate the UUID once.

 I'd really want to avoid relying on a centralized data store, because
 that's a single point of failure.  Let's try to find a model where that is
 not required.

 > As for 1), do you think there is a problem with the larger family
 subsuming the smaller family? So the smaller family's UUID would be
 overwritten. For 2), based on my current scheme, there would need to be an
 additional component conducting sanity checks of the UUIDs (i.e. making
 sure no two families have the same UUID).

 This is not an implementation issue that can be fixed by performing sanity
 checks, but a conceptual one.  Consider the following series of family
 UUID assignments by two separate Onionoo instances, where the second
 misses an update in the middle and receives it at a later time:

 {{{
  AB=1, CDE=2 -> ABCDE=2 -> ABC=2
  AB=1, CDE=2 ->         -> ABC=1 -> ABCDE=1
 }}}

 Both instances did the right thing according to your algorithm, yet they
 come up with a different identifier for this family in the end.  This is
 bad.

 Here's an alternative algorithm.  We define the largest fingerprint ever
 seen in an extended family as family identifier.  If we assume that `A < B
 < C < D < E`, then we'd have the following identifiers in the situation
 above:

 {{{
  AB=B, CDE=E -> ABCDE=E -> ABC=E
  AB=B, CDE=E ->         -> ABC=C -> ABCDE=E
 }}}

 This model doesn't have the issue where the order of processing
 descriptors leads to different identifiers, but it has another minor (?)
 issue: there's no way to split a family.  Once a relay said it's part of a
 family and one member of the family says the same, they get the same
 family identifier.  No divorce, no annulment possible.  The smaller
 families will always be shown together when a user looks up their family
 identifier.

 Note that the implementation would ideally only rely on server
 descriptors, regardless when those were referenced by a consensus.  When a
 relay `A` publishes a descriptor saying that it considers `B` to be part
 of its family, we store that forever.  And when `B` publishes a descriptor
 saying that `A` is part of its family, then they're considered part of the
 same family, here `B`.

 How does that sound?  Would you want to implement this algorithm and feed
 a month of recent descriptors into it to see what identifiers you'd come
 up with?  This is something we should test a bit, ideally in a stand-alone
 Python script or small Java program, before building it into Onionoo.
 Thanks!

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/18967#comment:3>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online