[tor-bugs] #24716 [Core Tor/DirAuth]: Try cranking up cbttestfreq consensus param, to see if it helps the current overload
Tor Bug Tracker & Wiki
blackhole at torproject.org
Fri Dec 22 16:51:22 UTC 2017
#24716: Try cranking up cbttestfreq consensus param, to see if it helps the current
overload
----------------------------------+--------------------
Reporter: arma | Owner: (none)
Type: task | Status: new
Priority: Medium | Milestone:
Component: Core Tor/DirAuth | Version:
Severity: Normal | Keywords:
Actual Points: | Parent ID:
Points: | Reviewer:
Sponsor: |
----------------------------------+--------------------
In Tor 0.3.1.1-alpha, commit d5a151a, we switched:
{{{
-#define CBT_DEFAULT_TEST_FREQUENCY 60
+#define CBT_DEFAULT_TEST_FREQUENCY 10
}}}
And on May 20 2017 the dir auths set the cbttestfreq consensus param to 10
as well.
Right now the network is overloaded with create cells, from the millions
of new clients that showed up in the past weeks.
Hypothesis 1: most of these clients are in learning mode much of the time,
so 5 million clients * 10 seconds = 500k new create requests per second
launched at the network, which contributes to the overload.
Hypothesis 2: some of these clients have learned quite low timeouts,
causing them to generate many circuits which they then almost immediately
cancel, but not enough of their circuits fail that they back away from
their learned value.
Hypothesis 3: the clients are stuck in a sad loop where they learn a low
cbt value, generate circuits for a while that mostly time out, eventually
they give up on their cbt value, then they generate a circuit every 10s
until they re-learn a low cbt value, and they cycle.
The experiment here (set cbttestfreq to 600 seconds temporarily) should
help us test these hypotheses. For 1, we will immediately reduce the load
of new circuits. For 2, this will help more slowly, because we'll have to
wait for each client to hit a situation where 90%+ of its circuit attempts
are being timed out, but in theory clients will slowly shift from having a
too-aggressive cbt, back into learning mode. And for 3, we'll push most
clients to the "learning, but very slowly" phase of their sad loop.
We can use the notice-level heartbeat messages in relay logs, to discover
whether the total number of create cells goes down dramatically. If it
does, win, we confirmed one or more of these hypotheses, and we can make a
plan from there. If it doesn't, also win, we know we need to look
elsewhere.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/24716>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list