[tor-dev] Automating Bridge Reachability Testing (#6414)

Sat Oct 13 00:08:06 UTC 2012

Isis:
> Hi Karsten!
> 
> Oh sheesh. I did not see it...I will have to figure out why. That is
> slightly worrying.
> 
> So, I am rushing to meet the final deadline, but I still think it is
> doable. I have mostly finished up my OONI work for the month, and I
> planned to spend the remainder of this month working on the bridge
> test.
> 
> I have finished most of the actual Tor connection code, as well as
> one of the four basic packet level scans (the icmp8 one). Two of the
> other packet level scans, the TCP SYN and ACK ones, are pretty much
> copies of the icmp8 one with a couple lines changed, and they
> shouldn't be a big deal.

A TCP SYN scan seems rather straight forward and quite useful. The ACK
scan is weird if only because I'm not clear on how it would work - you'd
have a bridge emit an ACK to a client? Wouldn't that fail for everything
that doesn't have a real IPv{4,6} address? All NAT clients would fail,
right? There are tricks to add an item to the NAT state table upstream
that won't leak out to the larger network - so we could work around it...

> 
> There is still the vanilla TLS handshake test/scan/thing, which has
> not been started yet, and will take a bit more time than the others
> because Python notoriously has problems with SSL bindings and
> libraries, so I'll need to do a bit of research on newer ones and
> updates and see which is the best to use now. I hear that tlslite[1]
> is the current best choice; if anyone else has any input on this it
> would be very helpful. :)
> 

My thought is that txtorcon is what you'd want here - implementing a Tor
client in Python is madness. I mean, I'm all for the madness but you
can't actually do very much with such a vanilla handshake - you can open
a TLS connection with a few lines of tlslite - that though is basically
it. You might as well just use any python tls library for that though.
tlslite is awesome but hardly anyone actually ships with it.

> There were a couple minor hups:
> 
> 1) When George asked me to test pluggable transports, this required 
> significantly more refactoring than I previously thought was
> necessary.
> 
> 2) Arturo redesigned the OONI testing framework API again to use a 
> completely different structure, which was supposed to be backwards 
> compatible and turned out not to be (though I believe that my recent
> OONI commits fixed that). However, I have been fighting the framework
> already, because the main scripts in OONI (/ooni/oonicli.py and
> /ooni/ooniprobe.py) control the reactor, and also expect static
> iterations through single test and single control functions for each
> asset (an asset in this case would be one bridge address). The bridge
> testing is rather dynamic (I would like it to be able to evaluate an
> approximate danger level to running the next test) and so the
> framework is kind of troublesome. Also, because the framework handles
> calling the reactor (in Twisted, the reactor is a sort of event 
> scheduler), and it also expects a rather linear progression of 
> defer.Deferreds (in Twisted, those are standin objects which execute 
> callbacks when they get results from some previous
> deferred/callback), it would be nicer if I had full control of these
> myself without needing to hack around the parent scripts. I think
> it's wise that OONI deals with these things for the testwriter in
> most cases, because the testwriter shouldn't be expected to be an
> expert in using Twisted. However, I also think that, in the long
> term, OONI shouldn't prohibit people who know what they are doing or
> are doing odd things from being able to do so. As a result, I've
> decided (for now), to use bits are parts of the OONI code before the
> recent refactoring, and later (after the deliverable) I will work on
> adding flags to OONI to give the test script full control of the
> reactor and deferreds, as well as evaluating whether or not the
> bridge test is even compatible with the new API. I do not want to get
> caught up in dealing with this right now, I just want to have it all
> working and deployable in a way that I know will work.
> 

It seems like OONI needs to learn what you want to do and to help you to
do it. The notion that you know what you're doing is correct and OONI
should do what you're doing for you - so other people, who wish to do
the same, can just do it the OONI way...

> 3) The indirect scans are becoming quite complicated to automate in
> any sane fashion. I still would like to continue working on this, as
> I'm quite enjoying the difficulty, but due to their temporary and
> volatile nature (they will change frequently depending on the
> blocking methods of a particular country and the currently available
> in-country bounces/proxies/whatever-thing-the-indirect-scan-uses), as
> well as the fact that many of these methods are still undiscovered, I
> think it is safe to add them as specialty cases after the fact
> without impacting overall general testing. There is one in particular
> that I would like to finish before the deadline because I am quite
> proud of it and am having a lot of fun working on it, but I'm first
> going to concentrate on wrapping up the active scans.
> 

I think at this point - perhaps I'm wrong - that merely having txtorcon
try to connect through a bridge and download a file with
trivsocks-client or something similar, is a perfectly fine test.

> There are other things which I've marked as helpful things to do, but
> which are not necessarily part of this deliverable:
> 
> 1) Having a parser for bridge descriptors to turn them into test
> inputs, and vice versa.

In an ideal world, I think a list of ip:port fingerprint would be a good
bet. Realistically, I think just having ip:port is also fine - we're
talking about reachability testing - in theory, if Tor can build a
circuit, we're happy. Even if there was a man in the middle, we wouldn't
really care, right? If it can reach the Tor network, we still win... :)

> 
> 2) Having some undiscoverable method for setting up lots of IPv6
> bridges on one OR (Tor currently only allows up to eight, I believe)
> and having these be discoverable by bridgedb and no one else. I was
> thinking of this while talking with Aaron, because he reminded me
> that people on IPv6 have tons of IPs available, and I was thinking
> that if we configured some type of one-way hash function, we could
> say that a bridge descriptor for 2001:db8::1:1 should actually mean
> multiple descriptors for 2001:db8::fa98:38d2 2001:db8::e099:2188
> 2001:db8::88aa:3b7 or something, derived from the output of hashing
> the original descriptor with the OR's key or something else. This
> would help distribute bridges in the future quite a bit, though it
> doesn't do much for the current bridge situation.
> 
> Anyone wanting to help with the above two things, or with an idea for
> another indirect scan, or with feedback on anything I'm working on,
> should feel free to contact me and it will be greatly appreciated.
> :D

I think the indirect scan stuff doesn't really make a lot of sense.
Unless by indirect, you still mean that alice (in country x) is talking
to bob (the bridge) on various protocols other than the single TCP port
that is a Tor bridge listener.

I imagine in direct to mean that you try to say, traceroute to the
upstream network where bob is known to be located. That doesn't tip
anyone off about bob at all - not to the remote network, nor to the
local network or the networks in between.

All the best,
Jake