[anti-censorship-team] Need to increase number of tor instances on snowflake-01 bridge, increased usage since yesterday
David Fifield
david at bamsoftware.com
Tue Sep 27 14:54:53 UTC 2022
On Mon, Sep 26, 2022 at 10:39:42AM +0200, Linus Nordberg via anti-censorship-team wrote:
> It seems likely that we're hitting a limit of some sort and next thing
> is to figure out if it's a soft limit that we can influence through
> system configuration or if it's a hardware resource limit.
tor has a default bandwidth limit, but we should be nowhere close to it,
especially disitributed across 12 instances:
BandwidthRate N bytes|KBytes|MBytes|GBytes|TBytes|KBits|MBits|GBits|TBits
A token bucket limits the average incoming bandwidth usage on this node
to the specified number of bytes per second, and the average outgoing
bandwidth usage to that same value. If you want to run a relay in the
public network, this needs to be at the very least 75 KBytes for a
relay (that is, 600 kbits) or 50 KBytes for a bridge (400 kbits) — but
of course, more is better; we recommend at least 250 KBytes (2 mbits)
if possible. (Default: 1 GByte)
I do not see any rate limit enabled in /etc/haproxy/haproxy.cfg.
I checked the number of sockets connected to the haproxy frontend port,
thinking that we may be running out of localhost 4-tuples. It's still in
bounds (but we may have to figure something out for that eventually).
# ss -n | grep -c '127.0.0.1:10000\s*$'
27314
# sysctl net.ipv4.ip_local_port_range
net.ipv4.ip_local_port_range = 15000 64000
According to https://stackoverflow.com/a/3923785, some other parameters
that may be important are
# sysctl net.ipv4.tcp_fin_timeout
net.ipv4.tcp_fin_timeout = 60
# cat /proc/sys/net/netfilter/nf_conntrack_max
262144
# sysctl net.core.netdev_max_backlog
net.core.netdev_max_backlog = 1000
Ethernet txqueuelen (1000)
net.core.netdev_max_backlog is the "maximum number of packets, queued on
the INPUT side, when the interface receives packets faster than kernel
can process them."
https://www.kernel.org/doc/html/latest/admin-guide/sysctl/net.html#netdev-max-backlog
But if we were having trouble with backlog buffer sizes, I would expect
to see lots of dropped packets, and I don't:
# ethtool -S eno1 | grep dropped
rx_dropped: 0
tx_dropped: 0
It may be something inside snowflake-server, for example some central
scheduling algorithm that cannot run any faster. (Though if that were
the case, I'd expect to see one CPU core at 100%, which I do not.) I
suggest doing another round of profiling now that we have taken care of
the more obvious hotspots in
https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/merge_requests/100
More information about the anti-censorship-team
mailing list