[tor-bugs] #33211 [Circumvention/Snowflake]: proxy-go sometimes gets into a 100+% CPU state
Tor Bug Tracker & Wiki
blackhole at torproject.org
Sat Feb 22 22:56:36 UTC 2020
#33211: proxy-go sometimes gets into a 100+% CPU state
-------------------------------------+------------------------------
Reporter: dcf | Owner: cohosh
Type: defect | Status: needs_review
Priority: Medium | Milestone:
Component: Circumvention/Snowflake | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-------------------------------------+------------------------------
Comment (by dcf):
Good work with the CCM discovery. It's different from what I filed the
ticket about, but it will help everything regardless.
I reproduced the symptoms I was thinking of and got a profile. Files are
at https://people.torproject.org/~dcf/bug33211/. I had to let proxy-go run
for an hour (details below) until it happened. Then I let it run for 30
minutes at high CPU so that whatever was going wrong would dominate the
profile. The [https://people.torproject.org/~dcf/bug33211/proxy-
go.profile.2020-02-22T20:32:32Z.png graph] tells the story.
`pion/dtls/internal/crypto/ccm.(*ccm).cbcRound` and
`crypto/aes.encryptBlockAsm` are major contributors, but I believe they
are a separate issue that is fixed by your comment:16. The culprit for
what I was describing of is
`pion/sctp.(*payloadQueue).markAllToRetrasmit`, called by
`sctp.(*Association).onRetransmissionTimeout`. It looks to me like a timer
somewhere is becoming negative and becoming a busy-loop (similar to the
bug in quic-go from #33401).
During the time of high CPU usage, proxy-go wasn't producing an abnormal
amount of log output (probably about 1 line per second on average), nor
was it sending an abnormal amount of packets.
{{{
go tool pprof proxy-go.380b1331.profiled 'proxy-
go.profile.2020-02-22T20:32:32Z'
File: proxy-go.380b1331.profiled
Type: cpu
Time: Feb 22, 2020 at 1:32pm (MST)
Duration: 1.59hrs, Total samples = 1.08hrs (67.68%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top5
Showing nodes accounting for 23.95mins, 37.03% of 64.68mins total
Dropped 999 nodes (cum <= 0.32mins)
Showing top 5 nodes out of 139
flat flat% sum% cum cum%
8.07mins 12.48% 12.48% 8.07mins 12.48% runtime.futex
4.65mins 7.20% 19.67% 11.15mins 17.23%
github.com/pion/sctp.(*payloadQueue).markAllToRetrasmit
4.35mins 6.72% 26.39% 6.35mins 9.82% runtime.mapiternext
3.98mins 6.15% 32.55% 3.98mins 6.15% crypto/aes.encryptBlockAsm
2.90mins 4.48% 37.03% 5.03mins 7.78% runtime.scanobject
(pprof) evince
}}}
== Procedure
Here is what I did to collect the profile, starting at commit
380b133155ad725126bc418d0e66b3c550b4c555 patched with attachment:proxy-go-
profile.patch.2. I started a local broker, proxy, and bridge, then set up
a loop to have a client connect every 2.5 minutes. I was also
intermittently downloading through the SOCKS port 10000, but I don't know
if that matters.
{{{
broker/broker --disable-tls --addr 127.0.0.1:8000
proxy-go/proxy-go --broker http://127.0.0.1:8000/ --relay
ws://127.0.0.1:8080/
tor -f torrc.server
while true; do timeout 150 tor -f torrc.client SOCKSPort 10000; done
}}}
torrc.server:
{{{
DataDirectory datadir-server
SocksPort 0
ORPort 9001
ExtORPort auto
BridgeRelay 1
AssumeReachable 1
PublishServerDescriptor 0
ServerTransportListenAddr snowflake 0.0.0.0:8080
ServerTransportPlugin snowflake exec server/server --disable-tls --log
snowflake-server.log
}}}
torrc.client:
{{{
UseBridges 1
DataDirectory datadir-client
ClientTransportPlugin snowflake exec client/client \
-url http://127.0.0.1:8000/ \
-ice stun:stun.l.google.com:19302 \
-log snowflake-client.log \
-max 2
Bridge snowflake 0.0.3.0:1
}}}
* I started everything at 2020-02-22 20:32:55.
* proxy-go jumped to 200% CPU at about 2020-02-22 21:36:40, roughly one
hour later. (It happened suddenly, not a gradual increase. The fans start
spinning.)
* After 5 minutes it was up to 250% CPU.
* After 8 minutes it was up to 280% CPU.
* I killed the tor client loop after 15 minutes at 2020-02-22 21:51:55.
The high CPU use of proxy-go continued, being in the range of about
260–295%.
* I killed the tor server after 30 minutes at 2020-02-22 22:05:46. proxy-
go's CPU use immediately became negligible.
* I killed proxy-go at 2020-02-22 22:08:02 and saved the profile.
The last log output of proxy-go was:
{{{
2020/02/22 22:04:23 Write 536 bytes --> WebRTC
2020/02/22 22:04:26 Write 536 bytes --> WebRTC
2020/02/22 22:04:28 Write 536 bytes --> WebRTC
2020/02/22 22:04:29 Write 536 bytes --> WebRTC
2020/02/22 22:05:46 io.Copy inside CopyLoop generated an error: websocket:
close 1006 (abnormal closure): unexpected EOF
2020/02/22 22:05:46 io.Copy inside CopyLoop generated an error: websocket:
close 1006 (abnormal closure): unexpected EOF
2020/02/22 22:05:46 io.Copy inside CopyLoop generated an error: websocket:
close 1006 (abnormal closure): unexpected EOF
2020/02/22 22:05:46 io.Copy inside CopyLoop generated an error: websocket:
close 1006 (abnormal closure): unexpected EOF
2020/02/22 22:05:46 io.Copy inside CopyLoop generated an error: websocket:
close 1006 (abnormal closure): unexpected EOF
2020/02/22 22:05:46 OnClose channel
2020/02/22 22:05:46 io.Copy inside CopyLoop generated an error: websocket:
close 1006 (abnormal closure): unexpected EOF
2020/02/22 22:05:46 OnClose channel
2020/02/22 22:05:46 datachannelHandler ends
2020/02/22 22:05:46 OnClose channel
2020/02/22 22:05:46 datachannelHandler ends
2020/02/22 22:05:46 io.Copy inside CopyLoop generated an error: websocket:
close 1006 (abnormal closure): unexpected EOF
2020/02/22 22:05:46 io.Copy inside CopyLoop generated an error: websocket:
close 1006 (abnormal closure): unexpected EOF
2020/02/22 22:05:46 OnClose channel
2020/02/22 22:05:46 io.Copy inside CopyLoop generated an error: websocket:
close 1006 (abnormal closure): unexpected EOF
2020/02/22 22:05:46 io.Copy inside CopyLoop generated an error: websocket:
close 1006 (abnormal closure): unexpected EOF
2020/02/22 22:05:46 OnClose channel
2020/02/22 22:05:46 OnClose channel
2020/02/22 22:05:46 datachannelHandler ends
2020/02/22 22:05:46 OnClose channel
2020/02/22 22:05:46 OnClose channel
2020/02/22 22:05:46 datachannelHandler ends
2020/02/22 22:05:46 datachannelHandler ends
2020/02/22 22:05:46 OnClose channel
2020/02/22 22:05:46 datachannelHandler ends
2020/02/22 22:05:46 OnClose channel
2020/02/22 22:05:46 datachannelHandler ends
2020/02/22 22:05:46 datachannelHandler ends
2020/02/22 22:05:46 datachannelHandler ends
2020/02/22 22:05:46 datachannelHandler ends
}}}
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/33211#comment:17>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list