[tor-talk] Clarification of Tor's involvement with DARPA's Memex

Sun Apr 19 23:26:37 UTC 2015

On Fri, Apr 17, 2015 at 05:38:37PM +0100, Thomas White wrote:
> there is some references to DARPA collaborating with some
> developers from Tor Project. I'd like to ask the developers of Tor to
> clarify what this involvement entails and why effort is being put
> towards a LE tool instead of working on hiding Tor users through
> improving anonymity or developing more circumvention based-tech.

Hi Thomas,

Thanks for asking. I apologize for not explaining these answers
earlier. I'm still trying to find the right balance for my time between
mentoring people in the Tor community vs better broader communication too.

Let me give you some background, and then I'll answer your question.

First of all, yes indeed we've been getting some funding from the
Memex project. This is what has allowed us to pay attention to and move
forward on some of the really cool things we've been working on lately
for hidden services:

* Fixing many performance and consistency problems with hidden services,
e.g.:
https://trac.torproject.org/projects/tor/ticket/11447
https://trac.torproject.org/projects/tor/ticket/13211
https://trac.torproject.org/projects/tor/ticket/13447
https://trac.torproject.org/projects/tor/ticket/13700
https://trac.torproject.org/projects/tor/ticket/14219
https://trac.torproject.org/projects/tor/ticket/14224

* Fleshing out the design and analysis for the "direct onion service"
option that folks like Facebook want:
https://lists.torproject.org/pipermail/tor-dev/2015-April/008625.html
plus discussing other tradeoffs between upcoming design choices:
https://lists.torproject.org/pipermail/tor-dev/2015-April/008597.html

* The work to let Tor controllers configure a hidden service directly
without using the torrc file, which the Globaleaks folks (among others)
are really excited to start using:
https://trac.torproject.org/projects/tor/ticket/6411

* The privacy-preserving statistics that let us conclude numbers like
"3-4% of Tor traffic is hidden service related" and "there are around
30000 hidden services today":
https://blog.torproject.org/blog/some-statistics-about-onions

* Assessing, triaging, and putting out new Tor releases to fix
hidden service security (stability) bugs recently:
https://blog.torproject.org/blog/tor-02512-and-0267-are-released

* I hear that Rob Jansen and others have been working on a more realistic
replacement for TorPerf (https://gitweb.torproject.org/torperf.git)
which will let us measure performance to a hidden service over time and
better understand where the bottlenecks are.

* I've also been talking to EFF about kicking off a Tor Onion Challenge
(to follow on from their Tor Relay Challenges), to a) get many people
to make their website or other service accessible as an onion site,
and b) come up with and/or build a novel use of onion services, to go
with the quite cool list that we have already but have done a poor job
of publicizing: Pond, Globaleaks, SecureDrop, Ricochet, OnionShare,
facebook's https onion, etc. You see, I used to be on the "making your
normal website reachable as an onion service is stupid" side of the fence,
but I have since come to realize that I was wrong. You know how, ten
years ago, website operators would say "I don't need to offer https for
my site, because my users ____" and they'd have some plausible-sounding
excuse? And now they sound selfish and short-sighted if they say that,
because everybody knows it should be the choice of the *user* what
security properties she gets when reaching a service? I now think onion
services are exactly in that boat: today we have plenty of people saying
"I don't need to offer a .onion for my site, because my users _____". We
need to turn it around so sites let their *users* decide what security
(encryption, authentication, trust) properties they want to achieve
while interacting with each site.

Our "3-4%" stat has actually been used by some of the other people (at
other groups) who are funded by Memex. They're talking to (among others)
the child porn division of the Department of Justice, and I've taught them
enough about Tor that they've basically turned into Tor advocates on our
behalf. They've found actual numbers to be really useful at countering the
FUD that some government people start out with. One of these people
explained to me last week that they listen to her more than she thinks
they'd listen to me, since she shows up as a neutral party. In any case
I am happy to have more people working on the "teach law enforcement
how Tor actually works" topic, which you can read more about here:
https://blog.torproject.org/blog/trip-report-tor-trainings-dutch-and-belgian-police
https://blog.torproject.org/blog/trip-report-october-fbi-conference

We do indeed need to be very careful and very thoughtful about what
things in the Tor network are safe to measure. The general heuristic we've
been using so far is: "Is that measurement taking advantage of something
that you could instead fix? If so, it's not ok to measure it." A prime
example here of what's over the line is running relays that get the
HSDir flag and then recording what hidden service descriptors they see
(and thus what hidden services they learn about). We would instead like
to treat that as a vulnerability and fix it:
https://trac.torproject.org/projects/tor/ticket/8106
https://trac.torproject.org/projects/tor/ticket/8243
https://trac.torproject.org/projects/tor/ticket/8244
and see also the "Attacks by Hidden Service Directory Servers" section of
https://blog.torproject.org/blog/hidden-services-need-some-love
as well as the section after it. (There are other researchers who have
used that technique, e.g.
http://freehaven.net/anonbib/#oakland2013-trawling and also Gareth Owen's
talk at 31c3. But we need to hold ourselves to a higher standard.)

On the other hand, if people publish a .onion address on a normal
website and Google runs across it and indexes the name, then it seems
clear that that's public information. There are many other ways to learn
about hidden service names which are ethically in-between, e.g.
http://blogs.verisigninc.com/blog/entry/new_from_verisign_labs_measuring1
These are great topics for us as a community to keep discussing.

Similarly, if your .onion address is public, and your webserver
doesn't require any authentication, and somebody fetches the content
on it... that also seems like public information. And if, for example,
the onion service is a forum, and users go there and then write their
names down or provide other identifying information, that isn't really
a bug or design flaw that Tor can fix.

These days there are services like Ahmia that list and index a bunch of
onion names and content:
https://ahmia.fi/search/
And to be clear, I think this is a great trend: we need to make onion
services easier to understand and more accessible (and faster and more
robust) for ordinary people, or we'll remain stuck with all the metaphors
that include the word 'dark'.

Ok, now that I've provided some background, I should try to answer your
question more clearly: we're using the Memex money to make hidden services
stronger, and we're teaching other people how Tor works. In terms of
teaching, it's the same thing I do for every other audience: explain about
all the projects Tor works on (Tor, Tor Browser, pluggable transports,
metrics, OONI, ...), which projects do what, how to measure and assess
Tor's anonymity, what problems we don't have great answers for, and so on.

Part of making Tor work better means that it works better for these people
too. And some of these people are indeed working on tools to gather and
organize public content from hidden services, with the intent that groups
like law enforcement will find their tools useful. We're not working
on these tools, but when Tor becomes better (for everybody) these tools
work better (for the groups they have in mind). It is a tricky balance,
but I think we have the balance right in this case.

Would I rather have funding where it's easier to find a good balance?
Absolutely. That's a major part of why we've been talking about funding
and funding diversity so much lately, and why we've been thinking about
crowdfunding specifically for hidden service design improvements, and
about growing our donation base and sustainability through donations
and other avenues. We need help from all of you to get there.

I don't want to play the "they'd do it anyway" card too strongly here --
first because who knows, maybe they wouldn't, and second because there
are definitely some activities that you stay away from no matter the
balance. I've talked a lot with the program manager of Memex, and he's
completely supportive of the "don't weaken Tor" mandate. In that sense
we're aligned: he very strongly believes that weakening Tor would screw up
this balance. I trust his intentions, and in any case we're the ones doing
the technical side of Tor so we can make sure that we do the right thing.

I should also make clear my opinion on some of the bad uses of Tor.
The folks who are using Tor for child porn, even though they are a tiny
fraction of overall Tor users, are greatly hurting Tor -- by changing
or reinforcing public perceptions of what privacy is for, and also
by attracting the attention and focus of law enforcement and making
that the way that law enforcement first learns about Tor. So, fuck
them, they should get off our network, that's not what Tor is for and
they're hurting all of us. Now, that doesn't mean we should weaken Tor,
even if we don't want them on the network. That slope is too easy to
slip down, and we must not get into the business of dictating what
is acceptable behavior for Tor users (which would eventually lead to
designing technical mechanisms to enforce these choices).

I just went back to re-read the Forbes article, and in retrospect it
sure makes it look like all of these companies are working on tools that
relate to Tor hidden services. They aren't. The main focus for Memex is
on automatically parsing and collecting info from ads on e.g. craigslist,
and generally getting better at the 'big data' side of searching and
organizing this data. More generally, Memex is made up of a bunch of
different companies, each doing their thing. I guess this is another
casualty of the ambiguity of the phrases 'dark web' and 'deep web',
since journalists find them hot to talk about but nobody reliably knows
what they refer to.

If you want to follow along with the actual technical work we're doing,
I invite you to observe or participate in the periodic "SponsorR"
meetings that happen on IRC:
https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorR
http://meetbot.debian.net/tor-dev/

Thanks,
--Roger