[tor-bugs] #33018 [Core Tor/Tor]: Dir auths using an unsustainable 400+ mbit/s, need to diagnose and fix

Wed Jan 22 07:56:15 UTC 2020

#33018: Dir auths using an unsustainable 400+ mbit/s, need to diagnose and fix
----------------------------+------------------------
 Reporter:  arma            |          Owner:  (none)
     Type:  defect          |         Status:  new
 Priority:  Medium          |      Milestone:
Component:  Core Tor/Tor    |        Version:
 Severity:  Normal          |     Resolution:
 Keywords:  network-health  |  Actual Points:
Parent ID:                  |         Points:
 Reviewer:                  |        Sponsor:
----------------------------+------------------------

Comment (by arma):

 Possible next steps beyond the above branch which I think would be worth
 taking:

 * Whitelist (i.e. never send 503's) IP addresses of relays in the
 consensus too. Or maybe it's better to consider relays in our descriptor
 list (i.e. if we vote about it, whitelist it). I have a commented-out
 function conn_addr_is_relay() in the above branch which somebody would
 need to write, and it will need to be fast fast fast or the lookup won't
 be worth it. ahf sketched out that function as "if we extend routerlist_t
 to have a map from addr to a routerinfo_t and from the v6 address, then I
 think you can do it fast."

 * Whitelist the IP address for the consensus health checker (I think that
 might be carinatum.tpo) so it stops yelling and thinking we're down. :)

 * Consider giving higher priority to microdesc-consensus and microdesc
 replies. That is, I would rather have relays successfully cache and mirror
 the microdesc flavored stuff, if I have to choose.

 * Make a change to the Tor code so relays remain on the client fetch
 schedule (i.e. fetch from relays and fallback dirs) until they publish
 their descriptor. That way we remove one variable from the mystery, i.e.
 "maybe these Tors that are mobbing me are all configured as relays but
 haven't found themselves reachable so that's why I don't know about them."

 * Look for patterns in the non-relay IP addresses that are bombing us with
 consensus fetch attempts. How often do they come back asking for another
 one? Does that timing pattern make us think they are a well behaving Tor
 that somehow thinks the dir auths' dirports are the best places to ask?

 * Consider a design for a more aggressive load shedding plan. Right now we
 send the 503 if we don't have the space left in our global write bucket,
 or we ran out of global write bucket the previous second. For vanilla-
 flavored dirport consensus responses to non-relay IP addresses, I could
 imagine something much more aggressive, like "could I serve ten of these?
 No? Then 503." with the goal of actually leaving some room to serve the
 more important ones rather than always being full or nearly full.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/33018#comment:5>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online