[tor-bugs] #6414 [Ooni]: Automating Bridge Reachability Testing
Tor Bug Tracker & Wiki
torproject-admin at torproject.org
Wed Jul 18 22:47:46 UTC 2012
#6414: Automating Bridge Reachability Testing
------------------------------------------------------------------+---------
Reporter: isis | Owner: isis
Type: project | Status: new
Priority: normal | Milestone:
Component: Ooni | Version:
Keywords: bridge reachability, metrics-db, automation, testing | Parent:
Points: | Actualpoints:
------------------------------------------------------------------+---------
An effort was made earlier this year to create a discovery system for
current
bridge reachability status #5028. This resulted in the development and
deployment of OONI's BridgeT ![26], which uses txtorcon to attempt a
connection, speaking the full Tor protocol, to the set of bridges
being
tested. Some bridges were scanned, and results were gathered. We would
like to
go back and automate this process, and possibly revise it if a better
methodology is proposed. Anyone with ideas or interest should feel
free to
join the discussion here.
While this automation is intended to be geolocationally agnostic, it
is
trivial to test a bridge's reachability from a country which does not
block
Tor, and therefore automation methodology should be developed
according to the
worst-case scenarios. Countries which block Tor, or have blocked Tor,
include
China, Iran, Lebanon, Qatar, United Arab Emirates, and Ethiopia. In
order to
ensure that the fewest amount of Tor bridges are blocked during
reachability
testing, it seems wise to assume that the test is being conducted from
one of
these countries. Also, any test methodology which produces accurate
results
from inside China or Iran would likely work just as well from any
non-Tor-blocking country.
'''Brief Overview of Dynamic Tor Bridge Blocking'''
From my understanding so far (please correct me if I have
misunderstood
something, or if there is more information), China's mechanism for
blocking Tor bridges takes the following steps (unconfirmed data is
prefaced by a question mark):
1. OP --> OR/Bridge Connection
a. Alice (OP/client in China) connects to Bob (OR/bridge),
completes
the TLS handshake, and sets up circuits.
b. This works for roughly fifteen minutes.
2. Protocol Identification & Fingerprinting
a. The GFC identifies Tor via fingerprinting the cipher list in
the
TLS Server Helo.
b. Tests for the precise trigger in the fingerprint were conducted
(I'll leave said tester(s) anonymous unless they would like to
speak
up) by fuzzing the TLS handshake ServerHello, and the precise
fingerprint for triggering the GFC's nascent probes was determined
to
be a specific 5 bytes. (?) It was also found that the GFC blocks
packets <= 79 bits.
c. Philip Winter's research showed that fragmentation of the
ciphersuite list would not trigger a probe [5].
3. Network Enumeration
a. The GFC adds Bob's IP and port to a queue of addresses to be
checked. These queues are processed every fifteen minutes (hence
why
Alice's connection functions normally at first).
b. A probe is sent to Bob during queue processing. The GFC probes
are
not yet fully understood, and unverified data in this section is
prefaced by a '?'. Thus far, the following is believed to occur:
* (?) Reportedly (speak up if you wish), there are eight "edge
routers" in China. The reporter stated that there was "one
for
each province", however there are
twenty-two Provinces in PRC -- twenty-three if you count
Taiwan. There is one "core router" which controls/routes to
the
eight "edge routers". Because all traffic into and out of
China
passes through these eight routers, all netblocks within
China
are essentially a private network behind the "edge
routers". (See question !#2 below.)
* (?) Because these "edge routers" are intercepting all
traffic,
they are able to temporarily hijack any IP from the
contained
netblocks.
* A hijacked IP and a random port (the range appears to be
~35000-60000) are used as the source to send a probe to the
queued IP:port of the suspected bridge. (See question !#3
below.)
* The probe does a TCP connect.
* Then it sends a TLS ClientHello and waits for the cipher
list in
the ServerHello message.
* If the cipher list matches that used by Tor, the IP:port
gets
blacklisted. Previous research has shown that this
blacklisting
is not permanent, but lasts for 12 hours after the last
successful connection by a probe [1]. (See question !#4)
== Testing Bridge Reachability ==
As Roger has stated on the Tor Blog, we can either do active or passive
scans
to check if a bridge has been blocked [4]. Passive scans, wherein either
the
bridge or the client report connections, are unreliable without results
from
active scans in the former case [5], and could potentially reduce privacy
and
anonymity in the later case.
'''Active Scans'''
'''Direct Methods'''
From most innocuous (least Tor-like) to most conspicuous (most Tor-like):
'''ICMP type-8 ping / echo'''
Tells us if the host running the Tor bridge is online, but not
necessarily
if the ORPort is open.
'''TCP ping / ACK'''
If TCP ACKs are timed to be sent infrequently (probably no more than
one
every five minutes or so), they can appear to be random network noise
rather than a scan. If we get a RST back, we know that we can at least
communicate with the bridge's ORPort though the GFC. This might look
odd,
if it gets noticed, especially since the GFC is stateful and might
realize
the ACKs are unsolicited.
'''TCP SYN'''
This still doesn't tell us if Tor is running, but, again, a SYN/ACK
would
let us know if the ORPort is reachable and accepting connections, a
RST
that it is reachable and not accepting connections (or the GFC is
sending
false TCP RSTs), and no response would mean that the GFC, or some
other
hop is dropping packets. Philipp Winter's research showed that the
client's SYN is transmitted through the GFC, which instead drops the
SYN/ACK response of known Tor relays/bridges [2].
'''TCP connect()'''
We could try a normal full TCP connect (SYN & ACK). This would be the
most
genuine-to-the-Tor-protocol test available for regions where SSL is
being
blocked. It could be useful here to test different types of
fragmentation,
for example, the old trick involving overlapping fragments to rewrite
the
TCP headers in the first fragment [25].
'''SSL Handshake'''
We could try doing a normal SSL handshake, as if contacting, for
example,
an Apache webserver over HTTPS. Another interesting idea would be to
run
an SSLObservatory from inside China, and simply pretend that the
bridges
are HTTPS webservers, which would look just like the normal
SSLObservatory
for bridges whose ORPort is set to :443 [14, 15]. As of this morning,
a
quick check on Tor relays shows that 27% of relays are run on :443 :
{{{
isis at acab:/var/lib/tor$ cat cached-microdesc-consensus | grep -e "^r\
[a-zA-Z0-9]*\ /*" \
>| grep " 443 " -c
779
isis at acab:/var/lib/tor$ cat cached-microdesc-consensus | grep -e "^r\
[a-zA-Z0-9]*\ /*" -c
2912
isis at acab:/var/lib/tor$ python -c 'from __future__ import
division;a=799/2912;\
>print a'
0.274381868132
}}}
with the most common ports being:
{{{
isis at acab:/var/lib/tor$ cat cached-microdesc-consensus | grep -e "^r\
[a-zA-Z0-9]*\ /*" \
>| cut -d " " -f 7 | sort | uniq -ic | sort -gr
1592 9001
762 443
217 80
34 9090
33 8080
21 9002
20 444
11 9031
11 110
9 22
7 21
[...]
}}}
I would assume that the percentage of bridges running on :443 is
higherthan
that of relays (question !#5). We could safely automate the testing
ofthose
relays without actually speaking Tor to them, by appearing to be
anSSLObservatory (question !#6). This would provide us with an extensive
setof canaries to help mitigate the zig-zag enumeration attack [9]
(seequestion !#7). However, in regions which block Tor based on the
ciphersuitelist in the ServerHello, such as in Iran in June 2011, it
doesn't
matterwhat ciphersuite we send as the client [16].
For those bridge not running on :443, we could have the bridge
scannermimic
another protocol and service which uses TLS/SSL, such as IMAPS,SFTP, for
instance it could pretend to be a client connecting to a Dovecotor vsftp
server.
'''Tor TLS/SSLv3 Handshake'''
We can drive a Tor Client, or a script pretending to be Tor (which
shouldknow about the different handshake versions, specifically their
commandand CERT cells [10]), to handle the TLS negotiation.
Interestingly,
forthe v2 and v3 protocols, we can use any ciphersuite list we like, as
longas we include
TLS_DHE_RSA_WITH_AES_256_CBC_SHA
TLS_DHE_RSA_WITH_AES_128_CBC_SHA
SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA
SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA
in addition to at least one extra that is not any of those four.
Torclients
before 0.2.3.11-alpha send a fixed ciphersuite list, and the GFCsends a
probe based on this fixed ciphersuite list [12]. It is apparentlyalso
the
case that the GFC will ''not'' send a probe if the standard
fixedciphersuite
is altered by at least two ciphers [12]. To assist with this,hellais
wrote a
handy Python script for grabbing the default ciphersuitelist from the
source
code of Firefox [13]. Also, as mentioned previously,we can fragment the
sending of the ciphersuite list to avoid triggering aprobe [5].
'''Indirect Methods'''
As Roger also mentions, we could use some variant of the idle scan.
[4, 8,
17] There are a few:
1. Use nmap / hping.
a. For nmap, there is an NSE script for zombie discovery, which
can be
combined with blockfinder to collect lists of hosts (probably
printers
or other archaic networked devices) with globally sequential IPIDs
[7,
18].
2. Use idlescanner, a Python script which uses the "content upload"
feature of popular sites, e.g. Reddit, Imgur, Facebook, Digg, Tinypic,
Tineye, etc., to attempt a connection to the bridge [19, 20]. This may
not
be entirely accurate, because it is based purely on the waiting for
the
upload site to timeout.
3. Use FTP PROXY or some other obscure bounce mechanism [21]. These
need
to be further researched.
4. Now we start to get into some crazier ideas. If we set up a bridge
purposefully to act as a canary, then we could send from an box inside
China a bunch of TCP SYNs with spoofed IP headers to the canary bridge
to
trigger a bunch of probes. Then we trigger the probes with something
(Winter wrote a program to do this called tcis [22, 23], and hellais
ported it to Python in OONI [24]) forcing the probes to go after the
canary bridge, during the two minutes that the probes have hijacked IP
addresses, we use the probes' hijacked IP addresses as zombies for
idle
scan of bridge. This would require some preliminary mucking with the
probes to see if they have any mechanism we could leverage to "see" if
the
bridge's packets made it to the probe. Basically we force the probe to
hijack an IP, which we then zombify while it's chasing the canary, and
get
the zombie probe to scan the the bridge for us, without ''it''
actually
scanning it, so it doesn't get blocked, and the traffic doesn't look
suspicious to anyone keeping an eye on the probes.
5. A commenter on the Tor blog had the idea to try to "borrow a
Chinese
botnet" to do the scans for us, since the botnet would probably
attract a
lot more attention by the Chinese officials than any amount of Tor
bridges. Also, with this idea, the scan could be made to look like
your
standard botnet running around launching PHP exploits at everyone and
their mothers. This is a highly entertaining idea, but it's also a bit
unethical (though I'm not certain -- do the ends justify the means in
this
case?), and it might come back to bite us.
a. If there were a way to get an in-country botnet to "take
notice" of
certain bridges, we could do a sort of "Here boy, fetch!" trick.
For
example, if a botnet appears to be having infected hosts report-
back
to an IRC channel, or scanning for Windows hosts with port 139
open,
we could mimic the responses an infected host would give while
spoofing the bridge's IP. I have no idea how feasible or reliable
that
would be.
''' Automation Concerns and Desired Features'''
We should avoid scanning bridges that we suspect are not
blocked. Therefore, eventually there should be an easy way to automate
feedback loops between Karsten's metrics and the bridge scanner. That
way,
once connections in a certain country drop significantly, the
automated
tests initiate in order to discover if those bridges are in fact
unreachable.
'''Design Features:'''
1. Allow for either eventual integration with, or some type of
feedback
mechanism for, metrics-db.
2. Should be automatable in a safe manner, i.e. the bridge scanner
should
know that a a full Tor connection to a specific bridge will likely
result
in that bridge being blocked, and thereby skip running any test which
include a full Tor connection.
3. Should be easily incrementable, meaning it should be simple to tell
the
test "only try TCP SYNs for this list of bridges", or "try everything
up
until a Tor-specific TLS/SSL handshake".
4. GeoIP awareness.
''' Implementation'''
I propose the test have all of the Active Direct Methods outlined above,
and
an easy way to test one at a time. For the actual testing, I want to err
on
the side of caution, in order to avoid getting bridges blocked. Therefor,
during bridge reachability testing, we should test via most innocuous
method
first, wait a while (probably a day or two), see what we learn, then
proceed
to the next method.
I was planning to use Python, because it's fast (in terms of coding time),
we
don't need to worry about portability in this instance, and it gives me
less
headaches than C. And Java makes me want to set things on fire. James
Arthur
Gosling, take it back.
For the indirect scanning methods, I believe these will be difficult to
entirely automate, but I plan to implement them so that they require as
little
human interaction as possible. If any of them prove reliable, they can be
used
as fallback methods when information concerning specific bridges is needed
immediately and there is a human willing to run the tests.
'''Project Timeline'''
'''July 2012'''
Two weeks of continued research and discussion until end of July.
'''August 2012'''
Four weeks for initial development phase. Beta tests should be
deployed by
31 August, and gathered data saved for evaluation of testing methods.
'''September 2012'''
Four weeks for evaluation of data previously gathered from beta
testing,
and continued development of bridge reachability testing tools. Alpha
release should be deployed by 30 August.
'''October 2012'''
Two weeks for final development, with a useable, automated bridge
reachability testing tool produced by 14 October. Two weeks for final
testing, data collection and report generation, and discussion of
further
steps for integrating the automation of bridge reachability testing
with
general Tor metrics.
'''November 2012'''
The project should be completed by 1 November 2012.
== Active Questions: ==
1. Should this automation be considered part of OONI? Or BridgeDB? Or
is
it part of some other project?
2. If there are only eight "edge routers":
a. What are their IP addresses?
b. Which protocols return traceroute data for these routers?
c. Is the "core router" on this side of the "edge routers", or
the
other?
d. What is the usual TTL of packets from the probes?
3. For how long is an IP hijacked by the GFC probe?
4. Roger mentions that "if the bridge had no other interesting
services
running (like a webserver), they just blackholed the IP address...but
if
there was an interesting service, they blocked the bridge by IP and
port."
Do the probes enumerate all ports, just common ones, or just
privileged
ports?
5. What percentage of current bridges are running on port 443?
6. Does the GFC automatically flag connections to TLS/SSL services
which
did not previously complete a DNS resolve?
a. If so, (because most browsers cache DNS resolutions) what is
the
max time interval between the last successful clientside DNS
resolution and a client's request for the GFC to remember that
DNS
was resolved?
b. Do connection directly to IP addresses on port 443 stand out
due
to a lack of DNS resolution?
7. Does the GFC queue all TLS/SSL connections for later enumeration?
----
'''References'''
[1] "How China Is Blocking Tor". Winter, Philip, and Lindskog, Stefan.
Karlstad University, Sweden (2011). p.7, section 5.1
http://www.cs.kau.se/philwint/pdf/torblock2012.pdf
[2] Ibid. p.6, section 4.2.
[3] Ibid. p.19, section 6.3.
[4] "Research problem: Five ways to test bridge reachability".
Dingledine, Roger.
The Tor Project (2011). https://blog.torproject.org/blog/research-
problem-five-ways-test-bridge-reachability
[5] "Case study: Learning whether a Tor bridge is blocked by looking at
its aggregate usage statistics".
Loesing, Karsten. The Tor Project (2011).
https://metrics.torproject.org/papers/blocking-2011-09-15.pdf
[6] "Level Four Traceroute". http://pwhois.org/lft/
[7] "ipidseq.nse - nmap script for globally sequential IP ID discovery"
http://nmap.org/nsedoc/scripts/ipidseq.html
[8] "Idle Scan". http://nmap.org/book/idlescan.html
[9] "paketto". http://dankaminsky.com/2002/11/18/77/
[10] "Research problems: Ten ways to discover Tor bridges". Dingledine,
Roger.
The Tor Project (2011). Point #10. https://blog.torproject.org/blog
/research-problems-ten-ways-discover-tor-bridges
[11] "Tor Protocol Specification". Dingledine, Roger, and Mathewson,
Nick.
The Tor Project (2012). Sections 2-4.
https://gitweb.torproject.org/torspec.git/blob_plain/HEAD:/tor-spec.txt
[12] "GFW probes based on Tor's SSL cipher list".
https://trac.torproject.org/projects/tor/ticket/4744
[13] "get_mozilla_ciphers.py - Get the default ciphers of Mozilla
Firefox".
https://trac.torproject.org/projects/tor/attachment/ticket/4744/get_mozilla_ciphers.py
[14] "EFF's SSL Observatory". https://www.eff.org/observatory
[15] "SSLObservatory git repository".
https://git.eff.org/public/observatory.git
[16] "Iran blocks Tor; Tor releases same-day fix". Dingledine, Roger.
The Tor Project (2011). https://blog.torproject.org/blog/iran-blocks-
tor-tor-releases-same-day-fix
[17] "new tcp scan method". Sanfilippo, Salvatore. (1998).
http://seclists.org/bugtraq/1998/Dec/79
[18] "Ioerror's blockfinder git repository".
https://github.com/ioerror/blockfinder
[19] "Zombie Scans using Unintended Public Services".
http://blog.makensi.es/post/3884103946/zombie-scans-using-unintended-
public-services
[20] "idlescanner.py - Use unintentional web services for portscanning".
http://makensi.es/tools/idlescanner.txt
[21] "FTP Bouncing for Portscanners - FTP PROXY".
http://nmap.org/nmap_doc.html#bounce
[22] "How the Great Firewall of China is Blocking Tor". Winter, Philipp.
Karlstads Universitet (2012). http://www.cs.kau.se/philwint/static/gfc/
[23] "NullHypothesis' tcis git repository".
https://github.com/NullHypothesis/tcis
[24] "OONI - chinatrigger.py - Python port of tcis".
https://github.com/hellais/ooni-
probe/blob/master/ooni/plugins/chinatrigger.py
[25] "An Analysis of Fragmentation Attacks". Anderson, Jason. (2001).
http://www.ouah.org/fragma.html
[26] "bridget.py". https://gitweb.torproject.org/ooni-
probe.git/blob/HEAD:/ooni/plugins/bridget.py
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/6414>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list