[tor-dev] Should popularity-hiding be a security property of hidden services?
George Kadianakis
desnacked at riseup.net
Fri Apr 3 14:57:33 UTC 2015
Hello,
Tor hidden services are meant to primarily provide server anonymity
but they also provide various other properties. For example, their
addresses are self-authenticated and their connections punch NAT. This
post is about another property, which is that Tor does not reveal the
popularity of a hidden service by default. That is, you can't easily
get the user count of a specific hidden service.
This is not that surprising to hidden service operators, since that's
also how the normal Internet works. In the normal Internet, someone
cannot learn the user count of an IP or website, except if they are
the operator or they control DNS or the site publishes analytics.
Over the past years, people have suggested various features that would
provide us with interesting information and optimizations but would
have the side effect of revealing the user count of hidden services
(or only of popular hidden services) to the public.
Some examples of such popularity-leaking features:
- Hidden Services dynamically set their number of Introduction Points
depending on their popularity. Basically, they self evaluate their
popularity, and use a formula to decide the number of Introduction
Points between 3 and 10. For a while, we thought that this formula
does not work properly (#4862, #8950), but we recently discovered
that it seems to be working in some manner.
While an interesting and useful feature on its own, it has the side
effect that it leaks how popular your hidden service is. Since a
hidden service publishes a descriptor every hour, you can monitor
hourly usage patterns of hidden services. Of course, you can't get
the exact user count, but you might be able to get rough approximate
numbers (we still haven't analyzed the formula enough to know
*exactly* how much).
- During our recent work on hidden service statistics [0] people have
suggested to gather statistics that would get us closer to learn the
number of hidden service users [1]. The suggested way to do so is to
have HSDirs or introduction points count the total number of
introductions or descriptor fetches and publish that number in their
extra-info descriptor. Since given a hidden service address you can
easily learn its HSDirs or IPs, it should be possible to map those
statistics to specific hidden services, which would leak their
popularity (more on this later).
There might be more examples that I'm missing, but this should be
enough to demonstrate the leaks. For the rest of this post, I will be
presenting various arguments for and against leaking popularity.
Disclaimer: I still am not 100% decided here, but I lean heavily
towards the "popularity is private information and we should not
reveal it if we can help it" camp, or maybe in the "there needs to be
very concrete positive outcomes before even considering leaking
popularity". Hence, my arguments will be obviously biased towards the
negatives of leaking popularity. I invite someone from the opposite
camp to articulate better arguments for why popularity-hiding is
something worth sacrificing.
== Arguments for leaking popularity and reaping its benefits ==
Here are a few arguments that people use to shrug off popularity-hiding.
I can relate to some of them, but I find the reasoning of some others
funny or even dangerous.
- "If we don't care about leaking popularity we can get useful statistics"
Indeed, we've dismissed various statistics that we could collect
because we were afraid that they would leak the popularity of hidden
services. If we didn't have this fear, we would have a better idea
on how much usage hidden services see, or whether people are
conducting DoS attacks on hidden services.
- "If we don't care about leaking popularity we can get nice optimizations"
As an example, the dynamic IP calculation is one of those optimizations.
I'm not aware of other optimizations, but I bet that we can think of
a few more if we completely remove popularity-hiding from our threat
model. Also, people have claimed that more statistics would reveal
more optimizations that we could do.
- "Popularity-hiding is just a side-effect of the Tor protocol, and
not a stated security goal"
People have claimed that popularity-hiding is not a stated security
goal of hidden services, or that the name "hidden services" does not
imply popularity-hiding in any way.
- "There are no realistic attacks that could happen from leaking popularity"
People have claimed that popularity is just a curiosity, and nothing
bad can come from leaking it. They say that protecting popularity
does not offer security against realistic or dangerous attacks.
Other people claim that popularity-revealing attack vectors are too
noisy and contain too much random data, hence it's hard to get
targetted popularity values out of them. They say that it might only
be possible for very popular hidden services, or for unlikely edge
cases.
- "There are probably other ways to reveal popularity. You can't fix them all"
That's actually a big fear of mine. That we are nitpicking about 2-3
popularity revealing vectors, while there are hundreds more
currently open. See #8742 for example, but I bet there are more
vectors that we need to think about.
== Arguments for protecting popularity ==
And here are arguments that make me believe that popularity is
something that should be protected.
- Popularity attracts attention
Anonymity likes uniformity, but popularity attracts attention.
There are literally infinite possible use cases where a hidden
service wants to be public and still not attract attention.
However, since the above argument has not been particularly
successful and only attack demonstrations will persuade a true
skeptic's mind, here is an attack scenario:
Try hard to imagine a dystopian future where authorities are
tracking down and hacking activist websites. They just received a
big list of hidden services, the result of a messy interrogation,
but they are all locked. Their hackers can hack some of them but not
all. Not much time before revolution, end of dystopian future and
happiness for all humanity. The dictator needs to decide which
hidden services to hack to stop the revolution. Which??
With popularity being public , they can get the popularity of
the biggest ones and target those first.
- Popularity can be used to find patterns in group movements now and in the past.
Even though you can't track specific users using popularity, you can
still track group of users. Also, these statistics are forever: even
if you didn't care about a group of users in the past, but you start
caring about them now, you can still look back and see their
development over time.
Here is an attack scenario:
Imagine a community that practices very dangerous urban climbing [2].
Imagine thousands of friends climbing away in happiness from all
over the world,
Imagine now that this community splinters in other smaller
communities, if you monitor their popularity, it will be
possible for you to observe the movement of that subculture.
As a further point, imagine now that dystopian future comes and
very dangerous urban climbing gets outlawed. The police catches an
urban climber in New London and gets a list of hidden services from
her. They can then check _historically_ how many users those hidden
services had. They can basically notice all the trends of the urban
climbing scene in the past years. Creepy, no?
As you probably well know, anonymity is not a binary option. It's
not like you are not either super anonymous, or not. It's more of a
fuzzy variable that depends on many things. OPSEC is a big part of
anonymity, and it seems to me that popularity has OPSEC consequences.
- Statistics noise will get reduced. Attacks only get better.
In the statistics we were talking about, each HSDir would reveal the
number of descriptor fetches it received over the past day. We know
that each HSDir serves about 150 hidden services, which means that
the final value in the end will contain the popularity of 150 hidden
services in one number. This is expected to be extremely noisy, and
I think that's one of the main hopes of people who don't care about
popularity hiding. That allows them to claim that popularity will
only be leaked for very popular hidden services.
While this indeed seems reasonable, my main intuition is that
attacks can only get better. Here are some ways that noise can be
reduced. I will focus on the HSDir case, but same arguments apply
to other suggested statistics like number of introductions per IP.
-- It's still early in the hidden services scene, so not many
services get lots of traffic. I imagine that many of those 150
hidden services are going to be very inactive, and not provide much
noise.
-- Hidden services publish hidden service descriptors to 6 HSDirs.
This means that every day you will learn 6 noisy values for
your target hidden service, not just 1. It's easier to remove noise
that way.
-- Also, those 5 irrelevant hidden services that provide the noise
will publish themselves to 6 HSDirs. Applying the same logic as
above, you might be able to learn information about the noise, which
makes it easier to remove. In a way, you can put all the statistics
measurements in a big system of equations, and start solving it to
reduce noise in the equation you are interested in.
-- Think of crazy edge cases. Maybe an introduction point is very
weak and unlikely to be picked and only got 10 HSes for a day. If
one of them is the hidden service you are interested in, there is
going to be much fewer noise than usual.
-- There might be other techniques for reducing noise, by combining
other statistics (like the number of hidden services per HSDir which
is already a stat), or by influencing the statistics yourself (like
Aaron's attack on the stats aggregation protocol [3]).
What I'm trying to say here, is that if you thought that the urban
climbing example was ridiculous because such a community cannot be
big enough to be visible in noisy statistics, maybe by reducing
noise you can actually make it distinguishable.
- There are not that amazing benefits from ditching popularity-hiding.
To be honest, I have not heard convincing enough arguments that
would make me ditch popularity hiding. Some extra statistics or some
small optimizations do not seem exciting enough to me. Please try
harder. This could be a nice thread to demonstrate all the positive
things that could happen if we ditch popularity-hiding.
Also, there is a small difference here between the stats and the
introduction point formula. The dynamic introduction point formula
is something that we could disable by default, but also leave it as
a configurable option for people who want to use it. That is, it
will then be *the choice of the hidden service operator* whether he
cares about popularity being hidden or not. With the statistics that
have been proposed, you don't give any choice. You just do it for
all hidden services forever.
- Principle of least surprise
Hidden service operators except that hidden services are at least as
secure as the normal Internet plus more. On the normal Internet,
popularity is private by default. Having this assumption violated on
hidden services, might not be polite.
- Popularity-hiding is crucial to maintain the deep sea security model of hidden services
As I have mentioned in the past, some people think of the onion land
as a very deep ocean. In some places of the ocean, you might be able
to see some buoys (some more visible than others). To visit them,
you need to wear your goggles and your snokrel, dive in and enter
from underwater.
This might not seem like a very concrete security model, but in any
case popularity is not revealed at any point. The sea is opaque and
you can't see the divers entering the hidden services.
Anyway this post has grown to immense size, and I was really hoping it
would be shorter.
On a more practical note, over the next few weeks, we should decide
what we want to do with the dynamic introduction point formula and
whether we should keep it or not (#4862). My current intuition is that
it should be disabled but also kept there as an option for people who
want to enable it. In any case, I hope that this thread can stimulate
discussion.
Also, if you are a hidden service operator I'm curious to hear about
whether you believe that popularity hiding is a security property that
should be preserved if that's even possible.
Cheers!
[0]: https://blog.torproject.org/blog/some-statistics-about-onions
[1]: https://lists.torproject.org/pipermail/tor-dev/2015-February/008247.html
[2]: https://www.youtube.com/watch?v=kpS7vhvkIQM
[3]: https://lists.torproject.org/pipermail/tor-dev/2015-March/008404.html
More information about the tor-dev
mailing list