[tor-bugs] #21394 [Core Tor/Tor]: connection timeouts are affecting Tor Browser usability
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed Feb 8 05:12:09 UTC 2017
#21394: connection timeouts are affecting Tor Browser usability
--------------------------------------------+---------------------
Reporter: arthuredelstein | Owner:
Type: defect | Status: new
Priority: Medium | Milestone:
Component: Core Tor/Tor | Version:
Severity: Normal | Resolution:
Keywords: tbb-performance, tbb-usability | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
--------------------------------------------+---------------------
Comment (by arthuredelstein):
I had a conversation with arma on IRC and he made many good suggestions on
how to go about investigating this further (reprinted with permission):
16:49 < arthuredelstein> In general, do connection timeout errors come
from the exit node, or from the client?
16:50 < armadev> it means you sent your begin cell, and then you didn't
get an end cell or a connected cell after 10 seconds
16:50 < armadev> it could be that you don't really have a tls connection
to your guard at all, you just think you do
16:51 < armadev> it could be that the exit receives the begin cell and
quietly drops it
16:51 < armadev> or maybe it gets the begin cell and starts its dns
resolve and that takes a while
16:51 < armadev> one way to investigate further might be to see if you
ever get a connected or end cell if you waited longer
16:52 < arthuredelstein> Ah, that's a good idea.
16:54 < arthuredelstein> Do you have an hypothesis why there are so many
timeouts? Do you think exits are dropping cells?
16:54 < armadev> i am wondering if it has to do with the ipv6 thing
16:54 < armadev> we have a bunch of bugs in ipv6 handling
16:55 < arthuredelstein> that's interesting
16:56 < arthuredelstein> in other words, handling at the exit?
16:57 < armadev> yes
16:57 < armadev> is there some pattern with which exits are on problem
circuits?
16:57 < armadev> you have the circuit events i hope so you can do the
stats?
16:57 < armadev> it is also possible that some exits, or even really just
a few but really big ones, and running out of file descriptors or
something
16:58 < arthuredelstein> another good idea. I will look into that.
16:58 < armadev> s/and running/are running/
16:59 < armadev> people.tp.o has an ipv4 and ipv6 address. can you pick
something simple and static that's only v4, and is that different?
17:01 < arthuredelstein> makes sense
17:02 < arthuredelstein> Something that made me wonder if it's something
closer to the client or guard is that in my first batch of tests (to
people.torproject.org) half of the attempted connections were double
timeouts, meaning two circuits with different exits failed before a
successful connection was made.
17:03 < arthuredelstein> it's -> the cause of the timeouts is
17:08 < armadev> another thing to explore is sending cells end-to-end on
the circuit that we know should elicit an immediate response
17:08 < armadev> like a begin to 127.0.0.1
17:08 < armadev> which should immediately reply with 'end, exitpolicy'
17:08 < armadev> and bypass any attempts by the exit to do a dns resolve,
open a socket, make a tcp connection, etc
17:16 < arthuredelstein> What's easiest way to send a begin cell?
17:17 < armadev> make a socks request?
17:17 < armadev> there might be something on the client side that tries to
block a request to a destination it knows will fail
17:17 < armadev> and also tor browser does isolation by socks parameters
so the new socks request will be isolated to a different circuit
17:18 < armadev> but i bet fixing those will still be more fun than my
other answer, which is to check out how to call
connection_ap_handshake_send_begin()
17:19 < arthuredelstein> Right. I think Tor Browser is blocking
connections to 127.0.0.1.
17:19 < armadev> heck, the browser itself might be blocking those too
17:19 < arthuredelstein> or possibly not making a socks connection
17:19 < armadev> and the tor client will be blocking them even if the
browser isn't
17:19 < armadev> i guess that's yet another experiment:
17:19 < armadev> do this same experiment with your tor client, no browser
involved
17:20 < armadev> and no weird socks isolation
17:20 < arthuredelstein> Yes.
17:20 < armadev> and no weird preferipv6 socksport flag
17:21 < arthuredelstein> aha
17:24 < arthuredelstein> I guess I can also try connecting to port 80 of
the exit's IP address as an alternative to 127.0.0.1.
17:25 < armadev> good idea
17:25 < armadev> (though then you have to guess the exit already)
17:25 < arthuredelstein> Yeah, I would need to turn off socks isolation.
17:25 < arthuredelstein> Or maybe do this outside the browser
17:26 < arthuredelstein> maybe I need to get acquainted with stem so I can
automate these tests
17:27 < arthuredelstein> assuming the browser isn't causing the problem
somehow
17:29 < armadev> having it automated would be extra cool because then it
could be done again later without redoing all the work
17:33 < armadev> let me hunt down a ticket you'll find fun and related
(though alas not the same)
17:35 < armadev> #5830
17:40 < arthuredelstein> And I see you also mention the possibility of
instrumenting a browser.
19:49 < armadev> yet another thought: if this happens pretty consistently,
can you collude with an exit relay to get debug-level logs at the time of
the failure? to see what it sees and what it doesn't see? safelogging
might make that harder.
19:49 < arthuredelstein> yeah, that would be great
19:50 < armadev> the precursor to that idea is: can you induce this
behavior in a chutney network?
19:50 < armadev> i would assume no, because it requires real users, real
load, real broken exits. but who knows!
19:50 < armadev> oh, and another: if you're curious if it's your guard, do
the experiment again with a different guard!
19:51 < arthuredelstein> yeah, I should definitely do that!
19:52 < armadev> if your guard is overloaded, you could easily be seeing a
delay there
19:52 < armadev> or the intermediate node too, for that matter
19:52 < arthuredelstein> right
19:52 < armadev> where you have to wait for somebody's freight train of
packets to move before you can get your connected cell
19:54 < armadev> i guess category 1 of problem, you send your begin and it
vanishes. you'll never get an answer.
19:55 < armadev> category 2, everything's working, it's just
slow/congested, and you need more patience than the hard-coded 10s
timeout.
19:55 < armadev> cranking up the timeout should help distinguish, for
starters.
19:55 < arthuredelstein> yes
20:03 < arthuredelstein> Are there cases where a properly-behaving exit is
expected to have category 1 behavior? Or should it always return an error
message to the client if a tcp connection fails?
20:10 < armadev> every non-response is a bug
20:10 < armadev> are there bugs? there used to be! we don't know of any
now.
20:11 < armadev> but of course, weird tcp stacks, and firewalls with rules
that drop packets, can induce long timeouts
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/21394#comment:8>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list