[tor-dev] Automating Bridge Reachability Testing (#6414)

Sat Oct 13 08:08:37 UTC 2012

On Sat 13 Oct 2012 at 00:08, thus spake Jacob Appelbaum:
> Isis:
> > Hi Karsten!
> > 
> > Oh sheesh. I did not see it...I will have to figure out why. That is
> > slightly worrying.
> > 
> > So, I am rushing to meet the final deadline, but I still think it is
> > doable. I have mostly finished up my OONI work for the month, and I
> > planned to spend the remainder of this month working on the bridge
> > test.
> > 
> > I have finished most of the actual Tor connection code, as well as
> > one of the four basic packet level scans (the icmp8 one). Two of the
> > other packet level scans, the TCP SYN and ACK ones, are pretty much
> > copies of the icmp8 one with a couple lines changed, and they
> > shouldn't be a big deal.
> 
> A TCP SYN scan seems rather straight forward and quite useful. The ACK
> scan is weird if only because I'm not clear on how it would work - you'd
> have a bridge emit an ACK to a client? Wouldn't that fail for everything
> that doesn't have a real IPv{4,6} address? All NAT clients would fail,
> right? There are tricks to add an item to the NAT state table upstream
> that won't leak out to the larger network - so we could work around it...
> 

Derp, /facepalm. s/ACK/FIN/

Although, you're right! We could do neat things with having the OP/testpoint
send a SYN to a fixed IP, then have the bridge send a SYN/ACK back with the
sourceIP set to the same fixed IP, the same way that pwnat thing does it with
ICMP8 and time exceeded packets.

> > 
> > There is still the vanilla TLS handshake test/scan/thing, which has
> > not been started yet, and will take a bit more time than the others
> > because Python notoriously has problems with SSL bindings and
> > libraries, so I'll need to do a bit of research on newer ones and
> > updates and see which is the best to use now. I hear that tlslite[1]
> > is the current best choice; if anyone else has any input on this it
> > would be very helpful. :)
> > 
> 
> My thought is that txtorcon is what you'd want here - implementing a Tor
> client in Python is madness. I mean, I'm all for the madness but you
> can't actually do very much with such a vanilla handshake - you can open
> a TLS connection with a few lines of tlslite - that though is basically
> it. You might as well just use any python tls library for that though.
> tlslite is awesome but hardly anyone actually ships with it.
> 

I already used txtorcon, and wrote the full Tor connection case. It's here[1].
I want to see what happens when the OP pretends to be simply connecting to any
normal TLS/SSL service instead of Tor. It's important to know if they are
blocking TLS completely, or fingerprinting something in Tor specifically.

> > There were a couple minor hups:
> > 
> > 1) When George asked me to test pluggable transports, this required 
> > significantly more refactoring than I previously thought was
> > necessary.
> > 
> > 2) Arturo redesigned the OONI testing framework API again to use a 
> > completely different structure, which was supposed to be backwards 
> > compatible and turned out not to be (though I believe that my recent
> > OONI commits fixed that). However, I have been fighting the framework
> > already, because the main scripts in OONI (/ooni/oonicli.py and
> > /ooni/ooniprobe.py) control the reactor, and also expect static
> > iterations through single test and single control functions for each
> > asset (an asset in this case would be one bridge address). The bridge
> > testing is rather dynamic (I would like it to be able to evaluate an
> > approximate danger level to running the next test) and so the
> > framework is kind of troublesome. Also, because the framework handles
> > calling the reactor (in Twisted, the reactor is a sort of event 
> > scheduler), and it also expects a rather linear progression of 
> > defer.Deferreds (in Twisted, those are standin objects which execute 
> > callbacks when they get results from some previous
> > deferred/callback), it would be nicer if I had full control of these
> > myself without needing to hack around the parent scripts. I think
> > it's wise that OONI deals with these things for the testwriter in
> > most cases, because the testwriter shouldn't be expected to be an
> > expert in using Twisted. However, I also think that, in the long
> > term, OONI shouldn't prohibit people who know what they are doing or
> > are doing odd things from being able to do so. As a result, I've
> > decided (for now), to use bits are parts of the OONI code before the
> > recent refactoring, and later (after the deliverable) I will work on
> > adding flags to OONI to give the test script full control of the
> > reactor and deferreds, as well as evaluating whether or not the
> > bridge test is even compatible with the new API. I do not want to get
> > caught up in dealing with this right now, I just want to have it all
> > working and deployable in a way that I know will work.
> > 
> 
> It seems like OONI needs to learn what you want to do and to help you to
> do it. The notion that you know what you're doing is correct and OONI
> should do what you're doing for you - so other people, who wish to do
> the same, can just do it the OONI way...
> 

Right, but there is a case to be made for simplicity. Which is why I was
thinking that it should handle these by default and then require extra flags
to hand control back to the testwriter.

> > 3) The indirect scans are becoming quite complicated to automate in
> > any sane fashion. I still would like to continue working on this, as
> > I'm quite enjoying the difficulty, but due to their temporary and
> > volatile nature (they will change frequently depending on the
> > blocking methods of a particular country and the currently available
> > in-country bounces/proxies/whatever-thing-the-indirect-scan-uses), as
> > well as the fact that many of these methods are still undiscovered, I
> > think it is safe to add them as specialty cases after the fact
> > without impacting overall general testing. There is one in particular
> > that I would like to finish before the deadline because I am quite
> > proud of it and am having a lot of fun working on it, but I'm first
> > going to concentrate on wrapping up the active scans.
> > 
> 
> I think at this point - perhaps I'm wrong - that merely having txtorcon
> try to connect through a bridge and download a file with
> trivsocks-client or something similar, is a perfectly fine test.
> 

But this burns bridges in places where Tor is blocked. I want to test *from
blocked countries* without their damned DPI boxes catching me, and I want to
automate it in a way they can't catch!

> > There are other things which I've marked as helpful things to do, but
> > which are not necessarily part of this deliverable:
> > 
> > 1) Having a parser for bridge descriptors to turn them into test
> > inputs, and vice versa.
> 
> In an ideal world, I think a list of ip:port fingerprint would be a good
> bet. Realistically, I think just having ip:port is also fine - we're
> talking about reachability testing - in theory, if Tor can build a
> circuit, we're happy. Even if there was a man in the middle, we wouldn't
> really care, right? If it can reach the Tor network, we still win... :)
> 

Yep! 

I've just realised that I'm not sure about the protocol for an OP connecting
for the first time...and acking torspec.git for 'directory authority'
obviously just gave me way to many results. I'm assuming that Tor has the
dirauths' public keys baked in, and thus checks the consensus signatures when
they come in. Is this right? 

So, provided you actually have a non-tampered Tor binary, and provided your
region/ISP/govt isn't blocking the dirauths by IP, then we know that if the
sigs check out okay on the consensus and you can reach a listed OR that you're
actually connected. So we don't really care about the fingerprints here,
except to tell the bridges apart, but then we can do that anyway by IP:Port.

> > 
> > 2) Having some undiscoverable method for setting up lots of IPv6
> > bridges on one OR (Tor currently only allows up to eight, I believe)
> > and having these be discoverable by bridgedb and no one else. I was
> > thinking of this while talking with Aaron, because he reminded me
> > that people on IPv6 have tons of IPs available, and I was thinking
> > that if we configured some type of one-way hash function, we could
> > say that a bridge descriptor for 2001:db8::1:1 should actually mean
> > multiple descriptors for 2001:db8::fa98:38d2 2001:db8::e099:2188
> > 2001:db8::88aa:3b7 or something, derived from the output of hashing
> > the original descriptor with the OR's key or something else. This
> > would help distribute bridges in the future quite a bit, though it
> > doesn't do much for the current bridge situation.
> > 
> > Anyone wanting to help with the above two things, or with an idea for
> > another indirect scan, or with feedback on anything I'm working on,
> > should feel free to contact me and it will be greatly appreciated.
> > :D
> 
> I think the indirect scan stuff doesn't really make a lot of sense.
> Unless by indirect, you still mean that alice (in country x) is talking
> to bob (the bridge) on various protocols other than the single TCP port
> that is a Tor bridge listener.
> 

Really? I think it makes the most sense for certain countries...

You're right that a lot of the indirect scans will only tell us if the
Bridge's ORport is open, and not if the Bridge is actually up and running and
able to accept clients, but in countries where Tor is blocked, clandestinely
obtaining that information in a non-fingerprintable manner combined with a
full Tor connection from a non-blocked country tells us that the Bridge is in
fact up and running and that, at the time of the scan, the ORport was
reachable from the censoring country. The trick is to do the indirect scan in
a way that the DPI boxes cannot catch, otherwise we might as well just be
doing a full Tor connection and burning the Bridge.

> I imagine in direct to mean that you try to say, traceroute to the
> upstream network where bob is known to be located. That doesn't tip
> anyone off about bob at all - not to the remote network, nor to the
> local network or the networks in between.
> 

No...that wouldn't work...or maybe it would if there winds up being some
strange case of a government blocking entire IP ranges. I've not heard of that
happening, have you? That seems inefficient, and like it would break more
things than it would "fix" (from the censor's POV) -- but then I wouldn't put
it past governments to do the first dumbass thing that appears to "fix" their
"problem".

China, for example, blocks by IP -- unless they find a service(s) running on
the box (as would be in the case of a host with multiple vhosts, then they
block the offending service by IP:port. So I don't think scanning the
neighbouring netblock tells us anything.

<(A)3
isis agora lovecruft

[1] https://gitweb.torproject.org/ooni-probe.git/blob/HEAD:/ooni/plugins/bridget.py