[tor-bugs] #24767 [Core Tor/Tor]: All relays are constantly connecting to down relays and failing over and over

Tue Feb 6 20:14:27 UTC 2018

#24767: All relays are constantly connecting to down relays and failing over and
over
-------------------------------------------------+-------------------------
 Reporter:  arma                                 |          Owner:  dgoulet
     Type:  enhancement                          |         Status:
                                                 |  accepted
 Priority:  Very High                            |      Milestone:  Tor:
                                                 |  0.3.3.x-final
Component:  Core Tor/Tor                         |        Version:
 Severity:  Normal                               |     Resolution:
 Keywords:  must-fix-before-033-stable, tor-     |  Actual Points:
  relay, tor-dos, performance                    |
Parent ID:                                       |         Points:
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+-------------------------

Comment (by teor):

 Replying to [comment:8 arma]:
 > My thoughts for the first design would be:
 >
 > * No need for a complex backoff thing. Just remember the failure for 60
 seconds, and during those 60 seconds, send back a destroy immediately.
 Reducing from n attempts per minute down to 1 per minute should be enough
 to help us survive until the clients get a consensus update and stop
 asking us to try.

 60 seconds seems sensible.
 And it's about the timeframe it takes to restart a relay (tor takes 30s to
 shut down by default).
 Anything less would be ineffective.

 > * We should avoid having this cached-failure thing impact client
 behavior. That is, it should cause *other people's* circuits to get
 destroys, but it shouldn't auto-fail our own client attempts. Maybe we
 should change how the client behaves, but if so, let's do it later, and
 not introduce subtle breakage in something we'll be considering for
 backport.

 Excluding origin circuits seems fine.
 (So does including them, but only on relays. But we already have backoff
 and limits on origin circuits.)

 > * Hm! I was going to say "but rep_hist_note_connect_failed() won't work
 if the relay isn't in our consensus", but actually, it is simply based on
 intended identity digest of the next destination, so it does look like we
 can reuse the data struct. Except, shouldn't we also be caching the
 IP:port that failed? Otherwise somebody can ask us to extend to a victim
 digest at an unworkable IP:port, and we'll cache "failure!" and then
 refuse all the legit cells going to the right address for the next minute.

 When we introduce IPv6 extends from relays (#24404), we only want to fail
 attempts to a single IP address, and not the whole relay. So dual-stack
 relays will get two attempts to other dual-stack relays: an IPv4 attempt,
 and an IPv6 attempt. I like that design, it makes sense to try both.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/24767#comment:9>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online