[tor-dev] Some statistics on introduction point stability and correctness
George Kadianakis
desnacked at riseup.net
Sat Jul 18 16:05:49 UTC 2015
Hello,
during the past months we have been working on evaluating and
confirming the stability and correctness of Tor hidden services. The
hidden services protocol has multiple steps, and its soundness depends
on various components of the Tor network. Throughout this document, we
assume that the reader is familiar with the hidden services
protocol. For people who want to get started now, we suggest you read
the hidden services protocol description on our website [0].
During our evaluation, we continuously fetched the descriptors of a
few hidden services, and collected statistics on the introduction
points they picked. The hidden services we tested were selected
arbitrarily so that they have decent uptime and are moderately
used. We have labeled them alphabetically from (a) to (e). The data
was collected over a period of 90 days using tor-hs-descriptor-fetcher [1].
With our experiment we were trying to answer questions like:
Q: Do hidden services publish descriptors correctly and in the
intended time schedule?
Q: Is the introduction point codebase working properly? Are hidden
service introduction points as stable as we want them to be?
Q: How many introduction points do hidden services expose themselves to?
Q: Normally, hidden services will use 3 intro points. However if
they think they are getting too much traffic they will
dynamically adjust the number of their introduction points
according to a self-evaluation of their popularity. In this case,
very popular hidden services may use up to 10 introduction
points. Does this algorithm work as intended?
Let's start our analysis by looking at some graphs:
-----
[*] https://people.torproject.org/~asn/desc_stats/lifetimes-2015-07-14-b.png
This graph shows the number of descriptors published per hour by each
hidden service. Including descriptor replicas, a normal hidden
service would publish 2 descriptors per hour which indeed seems to be
the case most of the time according to the graphs. This is good since
it shows that our system works properly.
However, we also observe that all HSes will publish more than 2
descriptors every hour at least 15% of the time. We believe this
occurs when they republish their descriptor because of an expired or
dead intro point.
----
[*] https://people.torproject.org/~asn/desc_stats/lifetimes-2015-07-14-a.png
Normally hidden services will keep their introduction points for a
random lifetime between 18 and 24 hours. This graph shows how long
the measured hidden services kept their introduction points for.
Looking at the three hidden services (a), (e) and (f) it seems that
introduction points indeed rotate as intended most of the
time. Specifically, we see that about 75% of the intro points of
those three hidden services indeed stay up for more than 18
hours. This reassured us that the introduction point rotation code
works well.
However, we can see that hidden services (b), (c) and (d) have lower
intro point lifetimes. We believe that this is caused by the dynamic
intro point formula which adjusts the number of introduction points
for popular hidden services. Consider a hidden service with 9 intro
points that starts getting less traffic and needs to go down to 5
intro points; in that case, the HS will discard 4 IPs reducing the
average intro circuit lifetime. If the above procedure happens
multiple times, it will drastically reduce the average lifetime of
intro circuits.
----
[*] https://people.torproject.org/~asn/desc_stats/lifetimes-2015-07-14-g.png
In this graph we present the number of introduction points of the
hidden services over time. As mentioned previously, hidden services
normally use 3 intro points, but they may increase that number if
they believe they are too popular.
Although this self-evaluation sounds like a neat feature, it also
means that an attacker can estimate the popularity of a hidden
service just by looking at the number of its introduction
points. While this might not sound as a dangerous information leak,
we think that there are legitimate use cases where the popularity of
a hidden service should be hidden [2].
Because we want to avoid this popularity leak and also because we
think that the number of introduction points is not that important
for scalability, we have disabled the dynamic intro point formula
completely (#4862). Now, hidden services establish 3 intro points by
default but operators have the option to tune that number using a
torrc parameter.
----
[*] https://people.torproject.org/~asn/desc_stats/lifetimes-2015-07-14-h.png
In this graph, we see the total number of relays that were used as
the introduction points of each hidden service over the whole
measurement period.
This data is interesting because introduction points can estimate the
popularity of their hidden services, so a service should ideally not
expose itself to too many of them.
Looking at the last graph we see that the three normal hidden
services have used approximately 250 distinct relays as intro points.
In other words, about 250 relays had the chance to measure the
popularity of those hidden services. This seems reasonable given the
90 days measurement period and if we assume an average lifetime of 20
hours per introduction point and about 3 intro points per HS, which
gives us some confidence for the correctness of the whole system.
In the meanwhile, hidden service (b) used about 2000 relays as IPs!
This was again caused by the dynamic intro point formula which forced
it to rotate introduction continuously. We have also heard rumors
that (b) got attacked by a DoS in the beginning of our measurement
period, which caused it to rotate introduction points even more.
We also see that hidden service (e) starts using more introduction
points from the 15th of May and onwards. This seems to be caused
because that hidden service started using two hidden service
instances for load balancing and each instance advertises a different
introduction point set.
With regards to load balancing, it's worth mentioning that we are
currently developing a tool called 'onionbalance' which will become
the better way of load-balancing hidden services. Donncha, the
author, released an alpha version just a few weeks ago which is worth
trying [3].
----
And this sums up our short analysis for today.
All in all, it seems that the system works properly most of the
time. Descriptors get published in the intended frequency and
introduction points get rotated as they should. The popularity leakage
we found for popular hidden services was also interesting, and we are
happy that it has since been fixed.
We hope you had fun reading our analysis, and please let us know if
there are any other privacy-preserving experiments you would like to
see on the hidden services world.
Footnotes:
[0]: https://www.torproject.org/docs/hidden-services.html.en
[1]: https://github.com/DonnchaC/tor-hs-descriptor-fetcher
[2]: https://lists.torproject.org/pipermail/tor-dev/2015-April/008597.html
[3]: https://github.com/DonnchaC/onionbalance/
https://lists.torproject.org/pipermail/tor-talk/2015-July/038312.html
More information about the tor-dev
mailing list