[tor-talk] Roger's status report, June 2012

Roger Dingledine arma at mit.edu
Sat Jul 7 15:41:15 UTC 2012


(What is this mail? See my explanation from the May mail I sent:
https://lists.torproject.org/pipermail/tor-talk/2012-June/024572.html )

------------------------------------------------------------------------

Here's what I said at the beginning of June that I hoped to do:

> - Participate in the Q2 Tor directors board meeting, including approving
> the updated 2012 budget.

Done. We now have a budget for the rest of 2012, including new positions
we want to fill -- but see below.

> - Understand the open positions in our current budget, what funders
> each one maps to, and what the priorities are in terms of spending the
> money. Then we can start putting some calls-for-resumes-and-code-samples
> up on the website, as well as prioritizing which calls we want to do
> when. Unless I ignore all of this until July.

I made a map of funders-to-expenses, to try to get a handle on how much
money from each grant could go towards new hires/contractors. Alas,
that's not sufficient -- we need to know not just how much we planned to
spend on the grant, but how much we have actually billed to the funder
so far on each grant. Melissa is our official juggler-of-these-numbers:
she makes sure all of our invoices are legit so we can continue to pass
our non-profit audits. I need to get up-to-date numbers from her before
we can plan for the future. I hope to do this at the dev meeting in July.

(Aren't you glad you get to learn about the operations details for Tor? :)

> - Get 0.2.3.16-alpha out. Get 0.2.2.37 out. Get 0.2.3.17-alpha out.

Done. We even called 0.2.3.17 a beta, and put out an 0.2.3.18-rc too.

We also moved Debian Wheezy to Tor 0.2.3.x, since Wheezy is stabilizing
soon and we want to be sure to have the new Tor version in it. That
means everybody who uses the torproject.org debs got upgraded to 0.2.3.x
without much warning. Hopefully it went smoothly.

> - Orchestrate the FOCI discussion and select the program.

Done. We ended up picking 10 of the 20 papers:
https://www.usenix.org/conference/foci12/tech-schedule/workshop-program
I think we have a great selection of people and papers. Please drop
by if you're in the Seattle area in August!

> - Tell Micah Sherr and Chris Wacek (Georgetown) about the open
> simulation questions; and get Rob Jansen (UMN/NRL), Mashael AlSabah
> (Waterloo), etc a good summary of the current situation.

Not done. But I now have a third target audience -- Nick and Andrea want
to learn which performance-related dev tasks they should be thinking
about. That's great, because "write Tor patches that we can try in the
simulators" is a big component of the upcoming performance work.

> - Read and consider http://microsoftjobsblog.com/zen-of-pm so I can
> help Adam Shostack help us get a good project manager.

Read; now mulling it over.

Initial thoughts are that this is not the right context for Tor --
the MS PM position has a huge "understand the space and come up with a
competitive product" component, whereas we have plenty of competitive
products and not enough coordination helping us make them reality.

Or said another way, the projects in question are already imagined,
sold, and have a funder; now we need to follow through.

That said, a lot of the properties described in the article are useful
to have in people who make sure we follow through. And in the future, we
need more people to imagine and "sell" more research/development projects.
So I sure wouldn't *mind* having more of these 'program manager' people.

Adam will be at PETS in Vigo, and I'll talk to him more about the
topic there.

> - I have a three hour slot at the SponsorF meeting this month. I'm going
> to try to bring everybody there up to speed on everything. While also
> letting other people talk for most of the time. Preparing for this talk
> will be a big part of my June.

Done. I put my slides here:
https://svn.torproject.org/svn/projects/presentations/slides-june2012.pdf

I'm afraid they don't work particularly well as stand-alone slides
without me talking next to them. I'll try to get the information from
them down into blog posts sometime; but alas, many other items are higher
on the list.

The talk went very well -- we got a lot of the other researchers asking
questions and making comments, and we ended up running out of time.

I feel a little bit bad that I have twice now left the "attacks on
anonymity" slides for the end, and twice now run out of time to cover
them well. If they schedule me for another slot next time, I should be
sure to start with them.

One of the nice things about working with this group of professors and
grad students is that they're excited to talk to people who a) have a
good handle on the real-world problems and adversaries, and b) know how to
present that information in a way that makes sense to security academics.
Basically every follow-on talk at the meeting had a reference to something
from my talk in it.

> - Meet with Kevin Dyer's lab at Portland State before the SponsorF
> meeting. Rob Jansen and Aaron Johnson (NRL) will be joining me.

Done. We talked about the state of website fingerprinting attacks in the
literature -- most studied attacks look at the "closed world" question
where you know that the user went to 1 of n web pages (chosen uniformly
at random), and you try to guess which one. It comes down to how similar
the page she picked is to the other pages she might have picked.
http://freehaven.net/anonbib/#oakland2012-peekaboo

The first issue I raised is that all these website fingerprinting attacks
are going about it the wrong way, at least for Tor. They look at the
packet level, and try to train their machine learning algorithms on
various properties of the packets. Seems to me that since we know it's
the Tor protocol underneath, what we should do is reconstruct the TCP
stream, and learn how many Tor cells are sent in each direction. Then see
which web pages in our set would use that number of cells. This approach
should lead to much simpler attacks, since it's just a question of "how
many pages fall into the same bucket as our target page?" rather than
doing machine learning over a trace of IP packets.

The approach should also lead to a new set of defenses to test: try
to make the set of pages that would use the same number of cells as
the target page ("collisions") as big as possible. This might be done
for example by having the entry guard add padding cells to "snap up"
to the next largest common size.

The second issue was one that Mike Perry keeps raising: just because
your closed-world data set doesn't include any web page similar to your
target page doesn't mean that your attack will work well in the wild. In
practice, there could be many other pages that the user visits, and if you
don't know about them, you don't know what your false positive rate is.
Knowing that there's an unknown level of inaccuracy should give at least
some attackers pause.

In fact, all the stats from the papers show how likely you are to
guess correctly, but they don't explore the distribution of which pages
are often guessed wrong and which are often guessed right. We need to
explore what properties of pages make them more/less likely to collide
with many other pages. It could even turn into a set of recommendations
for website authors -- "how to write your website in a way that doesn't
make your Tor users especially vulnerable to website fingerprinting".

Kevin acknowledged these issues as 'future work'. Some time I'll flesh
out the above two paragraphs and turn them into a blog post, to try to
draw more attention to this very important research area.

> - Help prepare for the SponsorF site visit that will occur a few days
> before PETS. We'll need to provide slides/etc, and likely even call in
> and do the phone presentation thing.

Started to prepare. I need to write a set of slides that will work
well even if somebody else presents them.

> - Go to Stamford CT to do a Tor talk for one of Ian's past students.
> http://privacyandsecurity.pbiresearch.com/agenda.html

Done. I used a subset of the "june2012" slides from above. It turns
out the talk slot was only 35 minutes, so I ended up smashing together
fractions of three different talks. I'm told it went well.

I also got a chance to chat with Ian about the vulnerability of
obfsproxy's "obfs2" protocol to smarter DPI approaches. I think we need
a new obfs3 protocol that uses ECDHE handshakes, so passive mitm attacks
won't be effective at looking for redundancy in the protocol (just because
each byte in the obfs2 traffic flow is uniformly random doesn't mean
that the flow as a whole has no redundancy). I'll file a bug sometime
in July to expand on this paragraph, for those who didn't understand it.

> - Write an abstract for the ecrypt talk I'm doing at the workshop
> before PETS:
> https://www.cosic.esat.kuleuven.be/ecrypt/provpriv2012/invited.html

Done:

"""
Title:
Tor, real-world attackers, and (un)provable privacy

Abstract:
Tor's approach to threat models is to try to understand the capabilities
of realistic attackers we expect to encounter, rather than picking
adversaries our protocols can withstand. This strategy has led us to
deploy systems that are not amenable to security proofs. Or to say it
even more strongly, we deploy provably _insecure_ systems relative
to real-world adversaries, because they're still the safest ones we
can deploy.

In this talk I'll explain some realistic attacks against Tor's anonymity
and blocking-resistance properties, and discuss some reasons why it's
hard to produce accurate and useful models for these attacks (and thus
hard to prove things about them).
"""

I tried not to present it as "you people who do proofs are useless to
us", because I want to draw them in and help them realize that the real
world is messy and hard to model cleanly. It's a long-shot.

> - Fly to Florence, for the Tor developers meeting and hackfest in July.
> https://trac.torproject.org/projects/tor/wiki/org/meetings/2012SummerDevMeeting
> https://trac.torproject.org/projects/tor/wiki/org/meetings/2012FlorenceHackfest

Done. I also had a nice chat with Gunner about our internal politics
and landmines he should be aware of, so he can facilitate the meeting
more effectively. More in July on how it goes.

> - Launch a working-group of pluggable transport developers and
> researchers, and make sure they all know about each other.

Done. I also set up a webpage with links as I know them right now:
https://www.torproject.org/docs/pluggable-transports
Please let me know if I missed anything!

> - Help SponsorF come up with metrics by which the SponsorF Red Team
> will judge the project's success.

Here are some early thoughts on two "claims" we should explore. The
idea is that we describe security properties we think we can back up
with our tools, and then they analyze them and try to find contradictions
and vulnerabilities.

1) Tor: "the adversary can't learn which user is communicating with
which destination."
Except: if he can see/measure the traffic flow between the user and
the Tor network, and also the traffic flow between the Tor network and
the destination.
Except: lots of other subtle things from the various anonbib papers
that it's probably not worth the red team's time to explore, since
anonymity researchers have already worked on them for years.

2) Obfsproxy: "the adversary can't DPI for the flows made by
obfs2, when it's using the shared-secret extension (#3 at
https://gitweb.torproject.org/obfsproxy.git/blob/HEAD:/doc/obfs2/protocol-spec.txt#l96)
That is, the most the adversary can learn from the bytes in the flows
is that they're random."
Except: after the handshake, timing and volume characteristics are still
like the underlying flow (Tor in this case).

More broadly, here are some notes on a wide variety of metrics we might
consider for each component in our blocking-resistance world, with more
emphasis on Tor's components and less emphasis on components I'm hoping
other projects will make progress on:

1) tor
  anonymity: "chance the adversary can learn that alice is talking to bob"
  computed by measuring...
    diversity of relay location against traffic confirmation
    anonymity against selective DoS attacks
    anonymity against website fingerprinting attacks
    ... against many other anonymity attacks
    each wrt an adversary with various capabilities
  diversity of user types ("deniability")
    within the "nearby" subset of users
  performance of network (bw, jitter, latency, etc)
    mean/median, but also consider long tails
  load on network (how many users, what they are doing)

2) obfuscating transport
  how much "similar" background traffic exists
  how similar it is
    relative to how much scrutiny by the adversary
  pain from blocking false positives ("value" of background traffic)
    ...over what timescales
  space and computational efficiency for embedding/disembedding

3) rate-limited credential distribution (e.g. how we use bridgedb)
  how many honest users we can support for a given adversary level
  how much work we require the adversary to do per credential he gets
  ratio of credentials we can support (bridge addresses we have) to
    credentials we can afford the adversary to get

4) scanning-resistance / being an innocent service until credential is shown
  [related questions to the obfuscating transport]

5) address allocation strategy (reachability testing)
  for various strategies, how quickly do addresses get blocked?
    with what distribution?
  how much does our reachability testing help the adversary?

------------------------------------------------------------------------

Here are some other things I did in June:

Reviewed Philipp Winter's upcoming FOCI paper "How the Great Firewall
of China is Blocking Tor" and gave him comments. You can read more
at http://www.cs.kau.se/philwint/static/gfc/ or wait for the revised
version of the paper.

Met with Nathan Freitas (of the Guardian Project) in NYC, to discuss the
state of Tor on mobile, and let him know about the upcoming pluggable
transports we're going to experiment with. So far it's not entirely
trivial to put Python-based Tor components on Android, but the variety
of pluggable transports that are going to be Python-based in the near
future means he's going to have to add it at some point.

Got up to speed on Tor design proposals 188-191, and sent comments
to tor-dev. I should read / comment on the later proposals one day too.

Talked to Nadia Heninger about her student Deepika Gopal's thesis at
UCSD entitled "Torchestra: Reducing interactive traffic delays over
Tor". It's like Rob Jansen's "Throttling Tor Parasites" paper, except
she proposes to have each pair of relays use several TCP sessions in
parallel, and migrate circuits between the sessions based on how loud
they are. Then TCP can push back on the session full of loud circuits,
while we still read freely from the session full of polite circuits. More
analysis required, but it sounds worth exploring.

Had a chat with a network security friend who pointed me at
http://code.google.com/p/appid/
which uses a huge regexp to identify traffic flows by protocol. Apparently
each DPI vendor has their own version of this tool. It's really easy to
make it think that a given flow is http, since it doesn't aim to resist
attacks like our obfuscating transports. It would be neat to make an
automated pluggable transport that uses this regexp to automatically
generate flows that appid thinks are a given protocol -- not with the
intent of building a perfectly indistinguishable flow, but rather with
the intent of driving up the false positives and uncertainty.

The same friend also pointed me at
http://jon.oberheide.org/0trace/
which can apparently traceroute over an established TCP connection --
it might be perfect for trying to figure out where China's GFW bridge
probes are coming from. I passed the link on to Philipp and George,
who I hope will do something neat with it.

Talked to Nick Feamster about posting a job advertisement for him -- he
wants to hire a researcher/developer to work on Bismark, which is quite
related to OONI. We (Tor) need to figure out our policy for posting job
descriptions on behalf of other parties -- it seems to me that at least
in this case we should do it.

------------------------------------------------------------------------

Here are some items I expect to do in July:

- Attend the Dev meeting and hack fest in Florence. Help everybody
understand about our upcoming grants, and the upcoming deliverables that
go with them.

- Attend PETS plus do a talk at the 'provably privacy' workshop in Vigo.

- Probably go to Berkeley for the last week in July.

- Summarize open simulation tickets and open performance tickets, so we
can prioritize them and get more developer attention on them.

- Publicize one or more new job openings on our jobs page:
https://www.torproject.org/about/jobs.html.en
and start collecting applications.

- Make sure our new core dev gets added to the people page, and make
sure we do some sort of announcement so there's closure. Follow-up on
the original core dev job announcements to say we've got one (but leave
the job announcement up, because we wouldn't mind having another if the
perfect person came along).

- Ian told me that Tariq's "Changing of the Guards" paper was flawed. I
don't yet agree that it's flawed -- I should follow up with them and see
which parts of the design need to be discarded and which I can resurrect.

- Get Tor 0.2.3.x closer to stable.

- Organize and announce (hopefully in that order) our upcoming plans
for encouraging more exit relays.

- Track down all the plans for my November trip to Amsterdam. The original
plan was to speak in Rotterdam at their CA conference (organized after
the DigiNotar thing), but that expanded to maybe talking to Dutch law
enforcement, and then maybe Austrian law enforcement, and now the Belgian
law enforcement want me to come explain the Internet to them too. All of
these things are worth doing (the more law enforcement groups understand
Tor, the less they hassle our exit relay operators and the less they
lobby for laws to outlaw privacy), but we'll see how many I can fit in.

- Start looking into properties we want for a more DPI-resistant "obfs3"
protocol.

------------------------------------------------------------------------

Things I'm still dropping the ball on:

- Transparently document the secteam process, especially since we have
concluded to use it far less often and only for critical security things.

- Answer the thread between Karsten and Jake where we had an excited
volunteer with a clearly useful contribution that we totally dropped on
the floor. Try to generalize the experience to improve our response to new
contributors. We used to be great at it, and lately we're all overloaded.

- Add a "scientific papers" exception to our trademark-faq: I want to give
blanket permission to scientific papers to use the word Tor in their paper
name, so long as they don't go and write software under that name too.
https://www.torproject.org/docs/trademark-faq

- Make a plan for fixing all the "CBT sometimes breaks Tor" issues.
https://trac.torproject.org/projects/tor/ticket/3443

- Start summarizing Tor research papers on the blog more regularly. There
have been a huge number of really important research papers lately,
and most Tor people don't know about them. Should I summarize them on
the blog (for a broader audience), or on tor-dev (for the rest of the
Tor developers), or what?

- I need new business cards.

- Get https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorA
through D back up on the wiki somewhere (Andrew took them down since
they were concluded, and since they just listed contract deliverables
rather than the progress reports and trac ticket links that we've been
doing for later funders; but we should keep them there for posterity).



More information about the tor-talk mailing list