[tor-dev] Revisiting prop224 time periods and HS descriptor upload/downloads

Mon Apr 4 16:13:39 UTC 2016

Hello,

during March we discussed the cell formats of prop224:
  https://lists.torproject.org/pipermail/tor-dev/2016-March/010534.html

The prop224 topic for this month has to do with the way descriptors get
uploaded and downloaded, how this is scheduled using time periods and how the
shared randomness subsystem interacts with all that.

Here are some discussion topics. Lots of text on the first two, less text on the rest:

- My main goal was to understand the prop224 sections [TIME-PERIODS] and [TIME-OVERLAP].

  Those sections specify a system where hidden services decide in a
  probabilistic manner _when_ to publish their descriptor so that not all
  hidden services publish their descriptors at the same moment and cause a
  thundering herd that stampedes the network.

  For this to work, time is split into time periods of k hours each. A few
  hours before each time period, there is an overlap period where hidden
  services start publishing their _next_ descriptors to HSDirs, so that when
  the upcoming time period starts, all the HSDirs have already received the
  descriptors and are ready to serve them.

  Consider the overlap period at the end of time period #N. During that overlap
  period, hidden services publish their descriptors for future time period
  #N+1.  In this case, hidden services also need to know the shared random
  value that will be active during time period #N+1, since it needs to be used
  to find the responsible HSDirs. This means, that the shared random value for
  time period #N+1 needs to be published _before_ the overlap period starts.

  This is not the case in current proposal 224, since time is split into time
  periods of 25 hours, which means that each day the start time shifts by one
  hour forward. Since the start/end times of the time periods keep on shifting,
  there will be cases where the right shared random value will not be
  accessible when the overlap period starts.

  So what to do?

  To fix this, I suggest we change the time period length to a day (24 hours).

  I also suggest we start time periods every day at 12:00 and finish after 24
  hours same time, so that it works well with the current shared randomness
  schedule (where the new shared random value gets published at 00:00 every day).
  [It might actually be wiser to actually reverse those schedules: create the
   SRV at 12:00 and start the time period at 00:00]

  In any case, this is how this might look like:

	  +------------------------------------------------------------------+
	  |                                                                  |
	  | 00:00      12:00       00:00       12:00       00:00       12:00 |
	  | SRV#1      TP#1        SRV#2       TP#2        SRV#3       TP#3  |
	  |                                                                  |
	  |   $         |-----------$-----======|-----------$-----======|    |
	  |                            overlap12               overlap23     |
	  |                                                                  |
	  +------------------------------------------------------------------+

                                      Legend:    [TP#1 = Time Period #1]
                                                 [SRV#1 = Shared Random Value #1]

  So, this basically gives a space of 12 hours between the SRV generation and
  the start of the next time period. We can then easily fit an overlap period
  of 6 hours before the next time periods starts. In the above diagram, the
  "equal sign" segments are the overlap periods. 'overlap12' is the overlap
  period from TP#1 to TP#2.

  Do you think that's reasonable? And do you see any problems with changing the
  time period length from 25 hours to 24 hours?

- So now that we have ironed out the time period stuff slightly, let's discuss
  the behavior that hidden services, clients and HSDirs should inherit.

  This email is quite long already so I'm going to go with examples, instead of
  formal specification. However, this stuff needs to go formally in the
  proposal IMO, so any help in formalizing it would be great.

  + Hidden Service behavior:

    Example 1: Our hidden service boots up at 14:00 of TP#1. In this case, we
     are nowhere close to the overlap period, so the hidden service should just
     publish its TP#1 descriptor to the HSDir hash ring using SRV#1 (which at
     that point should be in consensuses as "shared-rand-current-value").

     The hidden service might also want to calculate its overlap OFFSET (as
     specified in [TIME-OVERLAP]) and schedule a time callback for publishing
     its TP#2 descriptors.

    Example 2: Our hidden service boots up at 03:00 of TP#1. That's outside of
     the overlap period again, but this time the hidden service needs to use the
     SRV from "shared-rand-previous-value" because the SRV was rotated at midnight.

    Example 3: Our hidden service boots up at 09:00 of TP#1. That's inside the
     overlap period, so the hidden service should calculate its overlap
     OFFSET and compare it with the current time.

     If it has not passed, then we are in the exact same case as Example 2.

     If the overlap OFFSET _has_ passed, then the hidden service needs to act
     as Example 2, and _also_ publish its TP#2 descriptors to a second set of
     HSDirs using SRV#2.

    I think these are all the cases for the hidden service, but I would like to
    formalize this in a way that can be written in the spec. Particularly, I'm
    not sure how to formalize which SRV to pick at a given time point.

  + Client behavior

    My current intuition with regards to client behavior is that they should
    always fetch descriptors from the HSDirs of the _current_ time period. They
    should not concern themselves with the overlap stuff _at all_. The overlap
    system is there so that by the time the new time period starts, all the
    HSDirs have received the descriptors and are ready to help the
    clients. Clients should never notice the overlap stuff happening.

    For this reason I think we can remove this paragraph from the spec:

	   When a client is looking for a service, it must calculate its key
	   both for the current and for the subsequent period, to decide whether
	   the next period's key is valid yet.

    What do you think?

  + HSDir behavior

    Currently the spec says the following:

	   Hidden service directories should accept descriptors at least [TODO:
	   how much?] minutes before they would become valid, and retain them
	   for at least [TODO: how much?] minutes after the end of the period.

    After discussion with David, we thought of chopping off the first part of
    that paragraph and not imposing any such weak restrictions for accepting
    descriptors (see #18332).

    We still have not decided about the second part of that paragraph, that is
    how long descriptors should be retained after the end of the period. We
    currently think clock skew is the only thing that can bring clients to the
    wrong HSDir after the end of the period. Maybe an hour is OK? David
    suggested 12 hours. The current Tor is doing 48 hours... Any ideas?

  And this half-assedly sums up the behaviors of clients/HSes and HSDirs with
  regards to descriptor uploads and downloads. What is missing, and do you
  agree that parts of this should be in the proposal? 

- We should revert the torspec commit: "prop224: avoid replicas with the same blinded key"
    https://gitweb.torproject.org/torspec.git/commit/?id=8df8c0584392240aa8fecbcd2164a4489be7ae1a

  It adds a whole lot of complexity to prop224 with no clear security benefit
  against realistic adversaries. Furthermore, the time period and descriptor
  download/upload logic of Tor gets very complicated with it.

  I discussed this with teor and special and found it reasonable.

- The randomized revision-counter logic should also be simplified or even removed:
    https://gitweb.torproject.org/torspec.git/commit/?id=01119bf1291a40aa309dfb7d76edf790133f05b9

  I haven't looked much into this yet. If someone has thoughts please let me know.

- We should use fresh salt every time we rebuild the descriptor, but not for every replica:
    https://gitweb.torproject.org/torspec.git/commit/?id=01e865d592ffcbb67a0e6631c56e5b8048ea6065

- teor says we should revert the double hashing here, and just use tor's random API:
    https://gitweb.torproject.org/torspec.git/commit/?id=93f47f4f4e7614d4b3debfe9b5f3a22bfe5d64b1

peace