[tor-relays] Guard flag flapping
starlight.2015q3 at binnacle.cx
starlight.2015q3 at binnacle.cx
Sat Aug 8 00:05:33 UTC 2015
First, I am assuming you are running bare-metal on
a system and not in a virtualized server--everything
below is premised on that. Do not expect a virtual
server or Linux container to perform well as a high-
capacity Tor relay. It's possible to configure a
high-performance VM, but this is an esoteric art
and one is better off renting a small dedicated
physical server than going that route.
Your story of a relay setup that should measure
fast by all apparent metrics but is given terrible
rankings by BWauths is common this year.
BWauths scripts are known to be buggy, though
supposedly have been improved very recently.
'longclaw' just came back online with the "latest"
code, but after starting out with a failure to
measure 2000 relays two days ago, it's still
running 1000 shy of the full population:
https://consensus-health.torproject.org/#bwauthstatus
Scroll down a little and you will see 'longclaw'
is unique in voting 976 relays not-guard and 1709
relays not-fast. That seems a more serious issue
than cold start glitching IMO, and is not
impressive if that is what it really is.
A fifth BWauth is said to be arriving soon and it
is said that it will help.
Your relays currently are measured thusly:
greendream848
longclaw-w Bandwidth=1694 Measured=986
gabelmoo-w Bandwidth=1694 Measured=347
maatuska-w Bandwidth=1694 Measured=874
moria1 -w Bandwidth=1694 Measured=1550
spacequeen974
longclaw-w Bandwidth=1698 Measured=493
gabelmoo-w Bandwidth=1698 Measured=970
maatuska-w Bandwidth=1698 Measured=1930
moria1 -w Bandwidth=1698 Measured=2130
You can see future and past reports of these in
https://collector.torproject.org/recent/relay-descriptors/votes/
https://collector.torproject.org/archive/relay-descriptors/votes/
where
longclaw is 23D15D9. . .
gabelmoo is ED03BB6. . .
maatuska is 49015F7. . .
moria1 is D586D18. . .
That the measurements are all in the same ballpark
does indicate that some subtle issue with the
network and/or equipment may be at work and the
BWauths may not be at fault. But many have
complained that nothing they do seems to work.
If the firewall is performing stateful packet
inspection or any kind of DPI (deep packet inspection)
disable that for all incoming and outgoing Tor
traffic. It's all encrypted anyway so there's
no point, and DPI can drag down performance
big-time. The directory traffic is unencrypted
but I've never heard of a firewall with
stateful rules for the Tor directory protocol.
If you can put the system directly on the public
IP address with no firewall or local-rack router I
recommend doing this. Just make sure iptables are
set to protect login and other non-tor access.
Either that or disable iptables and strip the
server down so that nothing but the 'tor' process
and 'ssh' are running, and configure 'ssh' to
accept only certificate authentication (be sure to
set and test the cert auth before applying the
setting). Check for minimized listeners with
lsof -Pn | fgrep LISTEN
The email daemon should stay up to handle alarms,
just be sure it listens on 127.0.0.1. Likewise
anything else that is absolutely necessary. Use
*Port and *Policy settings in torrc to lock down
control and socks access to the daemon.
One notable sysctl that matters for high-capacity
relays is
net.netfilter.nf_conntrack_checksum = 0
though having this enabled would not cause the
current poor measurements.
You should change this setting:
net.ipv4.tcp_no_metrics_save = 1
turning this off was to work around a very-
long-ago kernel bug that is fixed everywhere.
Turning it on improves performance.
You might try
net.ipv4.tcp_wmem = 4096 250000 4194304
net.ipv4.tcp_rmem = 4096 375000 4194304
which will cause the congestion window to
get to full size a bit quicker, and these
net.core.somaxconn = 1024
net.core.netdev_max_backlog = 524288
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_keepalive_time = 600
which increase various limits for fast networks,
lots of connections.
Make sure these defaults values are active and
have not been changed to non-default by
/etc/sysctl.conf:
net.ipv4.tcp_moderate_rcvbuf = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_sack = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_congestion_control = cubic
And try adding
TXQUEUELEN=100000
to the
/etc/sysconfig/network-scripts/ifcfg-ethX
for the interface(s) where tor runs. Manually
activated with
ip link set qlen 100000 dev ethX
ip link show dev ethX
Finally make sure the kernel is of a vintage with
the Google-advocated connection-start
congestion-window increase:
https://lwn.net/Articles/427104/
http://samsaffron.com/archive/2012/03/01/why-upgrading-your-linux-kernel-will-make-your-customers-much-happier
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=442b9635c569fef038d5367a7acd906db4677ae1
If you end up implementing any of the above and it
works please describe the results in tor-relays
post.
More information about the tor-relays
mailing list