[tor-bugs] #4631 [Core Tor/Tor]: Idea to make consensus voting more resistant
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed Feb 19 01:19:43 UTC 2020
#4631: Idea to make consensus voting more resistant
-------------------------------------------------+-------------------------
Reporter: Sebastian | Owner: teor
Type: defect | Status:
| needs_review
Priority: Medium | Milestone: Tor:
| 0.4.4.x-final
Component: Core Tor/Tor | Version:
Severity: Normal | Resolution:
Keywords: needs-dirauth-email needs-torspec- | Actual Points: 0.2
update tor-dirauth robustness voting |
Parent ID: #33050 | Points: 0.5
Reviewer: nickm | Sponsor:
| Sponsor55-can
-------------------------------------------------+-------------------------
Changes (by teor):
* status: needs_revision => needs_review
Comment:
Replying to [comment:21 nickm]:
> Reviewing the idea only, before looking at the patch:
>
> So the idea is that we cut off the deadline for POSTed votes early, to
increase the odds that fetching votes from else will give everybody the
same set of votes? That's plausible, but I'm wondering whether we're
replacing one race condition with another.
I think we *are* replacing one race condition with another, but in
practice, the replacement is less risky. Here's why:
* in the first phase, each authority uploads *its own vote* to each other
authority (1->N)
* in the second phase, each authority downloads *all votes* from each
other authority (N<-N)
So the second phase ensures that all well-connected authorities have the
same set of votes. But it requires "no partial, late uploads", because
each partial, late upload splits the set of authorities.
This patch enforces the "no late uploads" rule, which implies "no partial,
late uploads".
> I'm also interested in:
> * what happens during the transition period where some authorities
have upgraded but others have not.
Some authorities reject late uploads, others do not.
If the late upload is a full, late upload, then the set of authorities
splits between:
* authorities that reject the late upload, and
* authorities that accept the late upload.
The mitigation is to ensure that all authorities' clocks are synchronised
within ~5 minutes.
Right now, all authority clocks are synchronised to within ~1 second:
https://consensus-health.torproject.org/#authorityclocks
If the late upload is a partial, late upload, then the set of authorities
is already split. The authorities that reject the late upload are
protected from the split. (Strictly, they reject the late upload, and end
up in the set of authorities that didn't get the vote).
In my chutney tests, this patch replaces "mistmatched digest" (consensus
failure) with "Vote received too late" (late vote rejection, consensus
success).
I've also done some testing in mixed 0.3.5 and master networks (equal
numbers of authorities). I haven't seen a consensus failure in a mixed
network after applying this patch, they were common before.
> * whether the fetch time is the right point for the POST cutoff, or
whether we'd be better off having it be a second or two before.
The cutoff time is the time that the entire vote has been received. So the
current code works as-is. But ideally, we would want multiple authorities
to receive each vote before the cutoff.
For chutney (and other fast networks), the current cutoff should not be
changed: the intervals are only a few seconds long.
For the public network (and other slow networks), votes are about 3.5 MB,
and let's say authorities have about 10 Mbps of bandwidth spare for votes.
So that's:
`3 seconds per vote upload * 8 authorities = 24 seconds`
So let's set the cutoff to: `fetch_missing_votes - vote_delay/6` ?
Note that fetch_missing_votes is already `start - dist_delay -
(vote_delay/2)`, so we can't go any earlier than `fetch_missing_votes -
vote_delay/4`.
Also, fast networks need about 4 seconds to vote reliably (upload,
process, download, process), so the extra cutoff should be at most
`vote_delay/5`.
Interestingly, the "successful vote upload" bandwidth cutoff for public
authorities is about:
`3.5 MB * 8 vote uploads / 300 seconds * 8 bits = 0.75 Mbit/s`
> We should also think about our diagnostics for unsynchronized
consensuses. Are they good enough that we'll be able to tell whether this
is helping or hurting?
Yes, I've seen the diagnostics change in chutney from "mistmatched digest"
to "Vote received too late".
If rejecting late votes splits the consensus, we'll see both "Vote
received too late" and "mistmatched digest" for the same consensus period.
Replying to [comment:22 nickm]:
> Teor, how confident are you that this is actually helping out for the
chutney case? Since you've been running into sync issues on chutney, I
guess you've probably been testing this patch in that case?
It is surprisingly effective. I haven't seen a consensus failure in ~100
chutney runs, since applying this patch.
> I've added a small comment on the patch. This looks reasonably solid.
I've fixed some of the logging in the patch.
Our remaining work is to choose the exact threshold.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/4631#comment:23>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list