[tor-dev] Potential projects for SponsorR (Hidden Services)
George Kadianakis
desnacked at riseup.net
Mon Oct 20 13:37:49 UTC 2014
Hello,
this is an attempt to collect tasks that should be done for
SponsorR. You can find the SponsorR page here:
https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorR
I'm going to focus only on the subset of those categories that
Roger/David told me are the most important for the sponsor. These are:
- Safe statistics collection
- Tor controller API improvements
- Performance improvements
- Opt-in HS indexing service
I haven't yet split projects into deliverables; this is a middle step
to getting there. Next step is to filter and then ticketify what we
have. After that we need to prioritize and pick the projects that will
become deliverables.
In each category, I have slightly ordered the items (so, more
important items will usually be on the top, but that's not always
true). I have also tried to include all the tickets that are marked as
SponsorR in trac.
So, let's go:
== Safe statistics collection ==
We've discussed this quite a bit over the past year and I think we all
pretty much agree on which stats are safe to collect and which not.
I think we all agree that collecting the number of HS circuits and
traffic volume from RPs (#13192) is harmless [0] and useful
information to have. We need to clean up Roger's patch to add that
information in extra-info descriptors, and then do some
visualisations. That would give us a good idea of how much HSes are used.
OTOH, other statistics like "# of HS descriptors" are not that harmless
and the upcoming HS redesign will block us from getting this information
anyway.
For now, I think we should focus on #13192 for this project.
== Tor controller API improvements ==
To better refine this project, we should think about what we want to
get out of it. Here are some outcomes:
a) A better control API allows us to perform better performance
measurements for HSes.
Karsten in #1944 worked on performance measurements of HS circuit
establishment. You can find his very useful results here:
http://ec2-54-92-231-52.compute-1.amazonaws.com/
We should understand exactly how Karsten is gathering those events,
and see whether we can improve the timing accuracy or if we are
missing any events. We need to also figure out how to do useful
measurements in causal events like the race between the
INTRODUCE_ACK cell and the RENDEZVOUS2. We also need to find a way
to match rendezvous circuits with introduction circuits:
https://trac.torproject.org/projects/tor/ticket/1944#comment:35
All in all, this seems like a project worth doing right because it
will be useful in the future. It can even act as an automated
regression test.
b) This might also be a good time to start working on automated
integration tests for HSes.
It should be possible to spin up private Chutney networks and test
that particular HSes are reachable. Or perform regression tests;
for example, Roger recently suggested writing a regression test to
make sure that clocks don't need to be synchronized to build HS
circuits (#13494).
We should also make testing networks better for HS testing:
- #13401 TestingTorNetwork should crank down RendPostPeriod too?
c) Tor should better expose error messages of failed operations. For
example, this could allow TBB to inform users whether they mistyped
the onion address or the HS is actually down, and it would also let
us do #13208. Proposal 229 and ticket #13212 are related to
this. We should see whether the PT team is planning to implement
proposal 229 and how we can synchronise.
d) There are various projects that are using HSes these days (TorChat,
Pond, GlobaLeaks, Ricochet, etc.). We should think whether we want
to support these use cases and how we can make their life easier.
For example, Fabio has been asking for a way to spin up HSes using
the control port (#5976). What other features do people want from
the control port?
And here are some more tickets marked as SponsorR from this category:
- #8993 Better hidden service support on Tor control interface
- #13206 Write up walkthrough of control port events when accessing a hidden service
- #2554 extend torperf to record hidden service time components
== Performance Improvements ==
This is the most juicy section. How can we make HS performance better?
IIUC, we are mainly interested in client-side performance, but if a
change makes both sides faster that's even better.
Some projects:
a) Looking at Karsten's #1944 results http://ec2-54-92-231-52.compute-1.amazonaws.com/
we see that fetching HS descriptors takes much more time than it should.
I wonder why this is the case. Is there another ntohl bug there?
We should perform measurements and get a good understanding of
what's going on in this step. Here are some tickets that Roger
opened to do exactly that:
- #13208 What's the average number of hsdir fetches before we get the hsdesc?
- #13209 Write a hidden service hsdir health measurer
And here is a ticket with a potential issue:
- #13207 Is rend_cache_clean_v2_descs_as_dir cutoff crazy high?
b) Improving the other parts of the circuit establishment process is
also important:
- #8239 Hidden services should try harder to reuse their old intro points
- #3733 Tor should abandon rendezvous circuits that cause a client request to time out
- #13222 Clients accessing a hidden service can establish their rend point in parallel to fetching the hsdesc
Furthermore, an area of Tor that might give us better performance
but we haven't really explored yet is preemptive circuits. #13239
is about building more internal circuits for HSes.
And here is a ticket suggesting more measurements:
- #13194 Track time between ESTABLISH_RENDEZVOUS and RENDEZVOUS1 cell
c) Another important project in this area is parallelizing HS crypto.
I haven't looked at what this would actually entail, but it will
probably involve implementing the undone parts of proposal 220/224.
d) This might be the time to implement Encrypted Services? Many people
have been asking for this feature and this might be the right time
to do it:
https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/ideas/xxx-encrypted-services.txt
e) Following the trail of #13207, we should look at all the magic
numbers currently used by HSes, document them and see if they make
sense. This includes the number of IPs (#8950), the number of
HSDirs/replicas, the intro point expiration date, etc.
Also, we should revisit the flags used when doing path selection
for RPs, IPs, etc.
f) On a more researchy tone, this might also be a good point to start
poking at the HS scalability project since it will really affect HS
performance.
We should look at Christopher Baines' ideas and write a Tor
proposal out of them:
https://lists.torproject.org/pipermail/tor-dev/2014-April/006788.html
https://lists.torproject.org/pipermail/tor-dev/2014-May/006812.html
Last time I looked, Christopher's ideas required implementing
proposal225 and #8239.
g) All the projects above are aiming at improving circuit
establishment performance, but none of them are dealing with
performance improvements after the HS circuit has been established.
On an even more researchy tone, Qingping Hou et al wrote a proposal
to reduce the length of HS circuits to 5 hops (down from 6). You
can find their proposal here:
https://lists.torproject.org/pipermail/tor-dev/2014-February/006198.html
The project is crazy and dangerous and needs lots of analysis, but
it's something worth considering. Maybe this is a good time to do
this analysis?
h) Back to the community again. There have recently appeared a few
messaging protocols that are inherently using HSes to provide link
layer confidentiality and anonymity [1]. Examples include Pond,
Ricochet and TorChat.
Some of these applications are creating one or more HSes per user,
with the assumption that HSes are something easy to make and there
is no problem in having lots of them. People are wondering how well
these applications scale and whether they are using the Tor network
the right way. See John Brooks' mail for a small analysis:
https://moderncrypto.org/mail-archive/messaging/2014/000434.html
It might be worth researching these use cases to see how well Tor
supports them and how they can be supported better (or whether they
are a bad idea entirely).
== Opt-in HS indexing service ==
This seems like a fun project that can be used in various ways in the
future. Of course, the feature must remain opt-in so that only
services that want to be public will surface.
For this project, we could make some sort of 'HS authority' which
collects HS information (the HS descriptor?) from volunteering
HSes. It's unclear who will run an HS authority; maybe we can work
with ahmia so that they integrate it in their infrastructure?
If we are more experimental, we can even build a basic petname system
using the HS authority [2]. Maybe just a "simple" NAME <-> PUBKEY
database where HSes can register themselves in a FIFO fashion. This
might cause tons of domain camping and attempts for dirty sybil
attacks, but it might develop into something useful. Worst case we can
shut it down and call the experiment done? AFAIK, I2P has been doing
something similar at https://geti2p.net/en/docs/naming
== Security / Miscellaneous ==
I also noticed that some tickets on trac were assigned to SponsorR but
I couldn't fit them in the above categories. They are mainly security
enhancements or code improvements. Here is a dump of the tickets:
Security:
- #13214 HS clients don't validate descriptor-id returned by HSDir
- #7803 Clients shouldn't send timestamps in INTRODUCE1 cells
- #8243 Getting the HSDir flag should require more effort
- #2715 Is rephist-calculated uptime the right metric for HSDir assignment?
Miscellaneous:
- #13223 Refactor rend_client_refetch_v2_renddesc()
- #13287 Investigate mysterious 24-hour lump in hsdir desc fetches
- #8902 Rumors that hidden services have trouble scaling to 100 concurrent connections
== Epilogue ==
What useful projects/tickets did I forget here?
Which tasks from the above we should not do? I just went ahead and
wrote down all the projects I could think of, with the idea that we
will filter stuff later.
Thanks!
Footnotes:
[0]: since RPs are picked at random by the client and not by the HS.
[1]: see https://moderncrypto.org/mail-archive/messaging/2014/000434.html
[2]: or if someone is more crazy, try to integrate GNUnet's GNS:
https://gnunet.org/gns
More information about the tor-dev
mailing list