[tor-bugs] #24782 [Core Tor/Tor]: Set a lower default MaxMemInQueues value

Mon Jan 8 21:04:46 UTC 2018

#24782: Set a lower default MaxMemInQueues value
---------------------------------+------------------------------------
 Reporter:  teor                 |          Owner:  ahf
     Type:  defect               |         Status:  assigned
 Priority:  Medium               |      Milestone:  Tor: 0.3.2.x-final
Component:  Core Tor/Tor         |        Version:
 Severity:  Normal               |     Resolution:
 Keywords:  tor-relay, tor-ddos  |  Actual Points:
Parent ID:                       |         Points:  0.5
 Reviewer:                       |        Sponsor:
---------------------------------+------------------------------------

Comment (by teor):

 Replying to [comment:5 dgoulet]:
 > We could also explore the possibility for that value to be a moving
 target at runtime. It is a bit more dicy and complicated but because Tor
 at startup looks at the "Total memory" instead of the "Available memory"
 to estimate that value, things can go badly quickly if 4/16 GB of RAM are
 available which will make Tor use 12GB as a limit... and even with a
 fairly good amount of swap, this is likely to be killed by the OOM of the
 OS at some point.
 >
 > On the flip side, a fast relay stuck with an estimation of 1GB or 2GB of
 RAM that Tor can use at startup won't be "fast" for much long before the
 OOM kicks in and start killing old circuits.

 This is not what I have observed. I have some fast Guards. Under normal
 load they don't ever use much more than 1 - 2 GB total RAM.

 > It is difficult to tell what a normal fast relay will endure in terms of
 RAM for Tor overtime but so far of what I can tell with my relays, between
 1 and 2 GB is usually what I see (in non-DoS condition and non-Exit).

 I usually see 1-2 GB for non-exits, and closer to 2 GB for exits.

 > I do believe right now that the network is still fairly usable because
 we have big Guards able to use 5, 10, 12GB of RAM right now... Unclear to
 me if firing up the OOM more frequently would improve the situation but we
 should be very careful at not making every relays using a "too low amount
 of ram" :S.

 If the fastest relay can do 1 Gbps, then that's 125 MB per second. 12 GB
 of RAM is 100 seconds of traffic. Is it really useful to buffer 100
 seconds of traffic? (Or, under the current load, tens of thousands of
 useless circuits?)

 So I'm not sure if using more RAM for queues actually helps. In my
 experience, it just increases the number of active connections and CPU
 usage. I don't know how to measure if this benefits or hurts clients. (I
 guess I could tweak my guard and test running a client through it?)

 Here's what happened when I followed my own advice in this thread:
 https://lists.torproject.org/pipermail/tor-relays/2018-January/014021.html

 I have a few big guards that are very close to a lot of the new clients.
 They were using 150% CPU, 4-8 GB RAM, and 15000 connections each. But they
 were not actually carrying much useful traffic.

 I tried reducing MaxMemInQueues to 2 GB and 1 GB, and they started using
 3-7 GB RAM. This is on 0.3.0 with the destroy cell fix. (But on my slower
 Guards and my Exit, MaxMemInQueues worked really well, reducing the RAM
 usage to 0.5 - 1.5 GB, without reducing the consensus weight.)

 I tried reducing the number of file descriptors, that reduced the CPU to
 around 110%, because the new connections were closed earlier. It pushed a
 lot of the sockets into the kernel TIME_WAIT state, about 10,000 on top of
 the regular 10,000. (Maybe these new Tor clients didn't do exponential
 backoff?)

 I tried DisableOOSCheck 0, and it didn't seem to make much difference to
 RAM or CPU, but it made a small difference to sockets (and it makes sure
 that I don't lose important sockets, like new control port sockets, so I
 left it on).

 I already set RelayBandwidthRate, but now I also set
 MaxAdvertisedBandwidth to about half the RelayBandwidthRate. Hopefully
 this will make the clients go elsewhere. But this isn't really a solution
 for the network.

 So I'm out of options to try and regulate traffic on these guards. And I
 need to have them working in about a week or so, because I need to run
 safe stats collections on them.

 I think my only remaining option is to drop connections when the number of
 connections per IP goes above some limit. From the tor-relays posts, it
 seems like up to 10 connections per IP is normal, but these clients will
 make hundreds of connections at once. I think I should DROP rather than
 RST, because that forces the client to timeout, rather than immediately
 making another connection.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/24782#comment:6>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online