[tor-bugs] #28424 [Core Tor/Tor]: Refactor hs_service_callback() to no longer need to run once per second?
Tor Bug Tracker & Wiki
blackhole at torproject.org
Tue Nov 27 18:24:02 UTC 2018
#28424: Refactor hs_service_callback() to no longer need to run once per second?
--------------------------+------------------------------------
Reporter: nickm | Owner: (none)
Type: defect | Status: new
Priority: Medium | Milestone: Tor: 0.4.0.x-final
Component: Core Tor/Tor | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor: Sponsor8-can
--------------------------+------------------------------------
Comment (by akwizgran):
If I'm reading the spec right, there are `hsdir_n_replicas = 2` replicas.
For each replica, the HS uploads the descriptor to `hsdir_spread_store =
4` HSDirs at consecutive positions in the hashring. Each client tries to
fetch the descriptor from one of the first `hsdir_spread_fetch = 3`
positions, chosen at random.
A lookup fails when the position chosen by the client is occupied by an
HSDir that didn't receive the descriptor, for both replicas. So failure is
possible when ''any'' of the first 3 positions is occupied by an HSDir
that didn't receive the descriptor, for both replicas. Churn can bring
this about in two ways: by removing the HSDirs that received the
descriptor, and by adding new HSDirs that push the HSDirs that received
the descriptor out of the first 3 positions.
How long do we expect it to take before churn makes a lookup failure
possible? We could measure this with historical consensus data, but let's
try a quick simulation first.
Figure 9 of
[https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_winter.pdf
this paper] shows the fraction of relays with the HSDir flag that join or
leave between consecutive consensuses. I'd estimate 0.01 by eye, so let's
conservatively call it 0.02. The churn rate counts both joins and leaves,
so a churn rate of 0.02 means each HSDir from the previous consensus has
left with probability 0.01, and new HSDirs have joined at the same rate.
There are
[https://metrics.torproject.org/relayflags.html?start=2018-08-29&end=2018-11-27&flag=HSDir
about 3,000] relays with the HSDir flag.
My code (attached) simulates each replica by creating 3,000 HSDirs, each
at a random position on the hashring, and remembering the first 4 HSDirs
on the hashring - these are the ones that receive copies of the
descriptor. Churn is simulated an hour at a time. In each hour, each HSDir
is removed with probability 0.01 and replaced with a new HSDir at a random
position. Then the code checks whether the first 3 HSDirs on the hashring
are all ones that received copies of the descriptor. If not, a lookup on
this replica could fail.
For simplicity I've simulated the two replicas independently - in reality
they'd be based on different permutations of the same HSDirs, but
independence seems like a reasonable approximation. The simulation runs
until lookups on both replicas could fail.
The mean time until both replicas could fail is 37 hours, averaged over
10,000 runs.
If this is roughly accurate then we should be able to keep the HS
reachable by waking Tor from its dormant state every few hours to fetch a
fresh consensus and upload new copies of the descriptor if necessary.
Perhaps I should extend the simulation to consider the probability of
lookup failure as a function of time, rather than the mean time until
failure becomes possible.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/28424#comment:4>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list