[tor-dev] Understanding the HS subsystem

Wed Feb 4 16:39:10 UTC 2015

[Declassifying this discussion and posting on [tor-dev]] 

David Goulet <dgoulet at ev0ke.net> writes:

> Hello HS elves!
>
> I wrote a document to organize my thought and also list what we have in
> the bug tracker right now about HS behaviours that we want to
> understand/measure/assess/track.
>
> It's a bit long but you can pass the first section describing the
> tickets and go right into the How and The Work to be done.
>
> Nick, you will see there is a SponsorS component but I didn't go into
> hard details there. We all know we need a testing network but for now
> I'm more focuses on making sure we can collect the right data (for HS).
>
> Very important part I would like feedback on is the "HS health service"
> for which I would like that we all agree of it's usefulness and way to
> do it properly.
>
> Cheers!
> David
>
> This document describes the methodology and technical details of an hidden
> service measurement framework/tool/<insert what this is>.
>
> NOTE: This is NOT intended to be run in the real Tor network. ONLY testing.
>
> Why and What
> -----
>
> The goal is to answer some questions we have regarding HS behaviours. Most
> of them have a ticket assigned to them but needs an experiment or/and added
> feature(s) so we can measure what we need.
>
> - Is rend_cache_clean_v2_descs_as_dir cutoff crazy high?
>   https://trac.torproject.org/projects/tor/ticket/13207
>
>   In order to address this, it seems we need a way to measure all the
>   interactions with the cache of an HSDir and a client. We need to assess
>   the rend cache cleanup timing values which will also helps with the upload
>   and refetch timings.
>
> - What's the average number of hsdir fetches before we get the hsdesc?
>   https://trac.torproject.org/projects/tor/ticket/13208
>
>   Using the control port for that is trivial but this needs a testing
>   network to be setup and has actual load on it.
>
>   It could also be setup as a feature of an "HS health measurement tool"
>   with a client fetching over and over the same .onion address randomly over
>   time.
>
> - Write a hidden service hsdir health measurer
>   https://trac.torproject.org/projects/tor/ticket/13209
>
>   This is a useful one, being able to correlate relay churn and HS desc.
>   fetch. This one needs more brainstorming on how we could setup some sort
>   of client or service that report/logs the results on crunching the
>   consensus for HSDir for a specific .onion address that we know and
>   control.
>
> - Refactor rend_client_refetch_v2_renddesc()
>   https://trac.torproject.org/projects/tor/ticket/13223
>
>   Insure correctness of this very important function that do fetches for the
>   client. It's in there that the HSDir (with replicas) are looped on so the
>   descriptor can be fetched.
>
> - Maybe we want three preemptive internal circs for hidden services?
>   https://trac.torproject.org/projects/tor/ticket/13239
>
>   That's pretty trivial to measure and quantify with the tracing
>   instrumentation added in Tor. No need for a new feature but an experiment
>   has to be designed to measure 2 internal circuits versus 3.
>
> - rend_consider_services_upload() sets initial next_upload_time which is
>   clobbered when first intro point established?
>   https://trac.torproject.org/projects/tor/ticket/13483
>
>   Do the RendPostPeriod option is working correctly. What's the exact
>   relation in time of service->desc_is_dirty and upload time of a new
>   descriptor.
>
> - Do we have edge cases with rend_consider_descriptor_republication()? Can
>   we refactor it to be cleaner?
>   https://trac.torproject.org/projects/tor/ticket/13484
>
>   This is a core function that is called every second so we should make sure
>   it behaving as expected and not trying to do uneeded upload.
>

Hello,

nice list of tickets. Here are some more ideas if you are looking for
more brainstorming action.

There is #3733 which is about a behavior that affects performance and
could benefit from a testing network.

And there is #8950 which is about the number of IPs per hidden
service. It's very unclear whether this functionality works as
intended or whether it's a good privacy idea. 

And there is also #13222 but it's probably easier to hack the solution
here, than to measure its severity.

>
> How
> -----
>
> Here are some steps I think are needed to be able to measure and answer the
> Why section.
>
2>   1) Dump the uploaded/fetched HS in a human readable way.
>     * Allows us to track descriptor over time while testing and analyse them
>       afterwards by correlating events with a readable desc. This kind of
>       feature will also be useful for people crawling HS on SponsorR.
>     * Should be a control event like for instance (ONLY client side):
>       > setconf HSDESC_DUMP /tmp/my/dir
>
>   2) On how many HSDir (including replicas) have been probed for one
>   single .onion request. (Which should be repeated a lot for significant
>   results.)
>     * Why have we probed 1 or 5?
>     * What made us retry? Failure code?
>     * Did the descriptor was actually alive on the HSDir? If not, when did
>     it move? (Correlate timings between HSdir and client in a testing network)
>
>   3) HS desc cache tracker. We want to know, very precisely, how things are
>   moving in the cache especially on the HSDir cache side.
>     * When and why an HS desc is removed?
>     * Why it hasn't been stored in the cache?
>     * Count and when a descriptor is requested.
>
>   4) Track the HS descriptor upload. Log at what time it was done. Use this
>   to correlate with RendPostPeriod or when desc_is_dirty is set. Also should
>   be correlate with the actual state of the HSDir. Did it already have it?
>   Is the HSDir gone?
>
>
> What to be done
> ----------------
>
> * Collect data
>
> "Collect it all" --> https://i.imgur.com/tVXAcGGl.jpg
>
> It's clear that we have to collect more data from the HS subsystem. Most of
> it can be collected through the control port but some are missing.
> Measuring precise timing of HS actions (for instance let say descriptor
> store) is not possible with the control port right now and also might not be
> that relevant since the job of this feature is to report high level events
> and push command to the tor daemon.
>
> Tracing should be used here with a set of events added to the HS subsystem
> to collect the information we need so it can be analyzed after the
> experiment is run. This is only for performance measurement, the rest should
> as much as possible use the control port.
>
> * Testing network (much SponsorS)
>
> Once we are able to extract all the data we need, time to design experiment
> that allows us to run scenarios and collect/analyze what we want. A scenario
> could be this example with a set of questions we want to answer going with
> it:
>
> * 50 clients randomly accessing an HS in a busy tor network.
>   - What is the failure rate of desc. fetch, RP establishment, ...?
>   - What are the timings of each component of the HS subsystem?
>   - What are the outliers of the whole process of establishing a connection
>     to the HS?
>   - How much relay churn affected HS reachability.
>
> And dump a human readable report/graphs whatever is useful for us to
> investiguate or assess the HS functionnalities.
>
> * HS health service
>
> ref: https://trac.torproject.org/projects/tor/ticket/13209
>
> What about a web page that prints the result of:
>
>   1) Fetch last 3 concensuses (thus 3 hours)
>   2) Find the union of all HSDir responsible for a.onion (we control that
>   HS service and should be up at all time else the results are meaningless.)
>   3) Fetch the descriptor on each of them
>   4) Graph/log how many of them had it thus giving us a probability of
>   reaching the HS within a time period.
>
> So 3) is the tricky one. There are multiple ways of achieving that possibly:
>
>   i) New SOCKS command to tor that a client could use.
>      - Command would have an onion address with it and the reply should be 0
>      or 1 (successful attempt or not) with the HSDir fingerprint with it.
>
>   ii) Control event.
>       > setconf HSDESC_FETCH_ALL <this_is_a.onion>
>       [...]
>       Prints out the results as they come in with the HSDir information.
>
>   iii) A weird way of doing this with an option "tor --fetch-on-all-hs-dir
>        this_address.onion", print out the results and quit.
>
> I much prefer i) and ii) here. Not sure which one is best though.

Hm, I think I like (ii) here. It doesn't seem to be much more work
than (i) and a few researchers have been asking for such functionality
for years.