Proposal 160: Authorities vote for bandwidth offsets in consensus

Tue May 12 14:05:11 UTC 2009

On Mon, May 04, 2009 at 07:12:10PM -0400, Roger Dingledine wrote:
> Filename: 160-bandwidth-offset.txt
> Title: Authorities vote for bandwidth offsets in consensus
> Version: $Revision$
> Last-Modified: $Date$
> Author: Roger Dingledine
> Created: 4-May-2009
> Status: Open
> Target: 0.2.2.x
> 
> 1. Motivation
> 
>   As part of proposal 141, we moved the bandwidth value for each relay
>   into the consensus. Now clients can know how they should load balance
>   even before they've fetched the corresponding relay descriptors.
> 
>   Putting the bandwidth in the consensus also lets the directory
>   authorities choose more accurate numbers to advertise, if we come up
>   with a better algorithm for deciding weightings.
> 
>   Our original plan was to teach directory authorities how to measure
>   bandwidth themselves; then every authority would vote for the bandwidth
>   it prefers, and we'd take the median of votes as usual.
> 
>   The problem comes when we have 7 authorities, and only a few of them
>   have smarter bandwidth allocation algorithms. So long as the majority
>   of them are voting for the number in the relay descriptor, the minority
>   that have better numbers will be ignored.
> 
> 2. Options
> 
>   One fix would be to demand that every authority also run the
>   new bandwidth measurement algorithms: in that case, part of the
>   responsibility of being an authority operator is that you need to run
>   this code too. But in practice we can't really require all current
>   authority operators to do that; and if we want to expand the set of
>   authority operators even further, it will become even more impractical.
>   Also, bandwidth testing adds load to the network, so we don't really
>   want to require that the number of concurrent bandwidth tests match
>   the number of authorities we have.
> 
>   The better fix is to allow certain authorities to specify that they are
>   voting on bandwidth "offsets": how much they think the weight should
>   be changed for the relay in question. We should put the offset vote in
>   the stanza for the relay in question, so a given authority can choose
>   which relays to express preferences for and which not.

As Roger and I discussed on Friday, this seems needlessly complex.
The semantics you're suggesting seem to be that an authority can say:
   w Bandwidth=X
if they're not too sure, and 
   w Bandwidth=Y Offset=Z
if they're pretty sure that the real value of the bandwidth is Y+Z.

This is a big buggy, as noted.  Suppose that there are 3 authorities,
and they say about a single router:
       Bw=1000 Offset=0      (Total=1000)
       Bw=1500 Offset=0      (Total=1500)
       Bw=1000 Offset=500    (Total=1500)
Note that the algorithm described below will give a median Bw if 1000
and a median offset of 0, producing a declared bandwidth of 1000.  But
if instead we had taken the median of the actual observed totals, we
would have gotten a value of 1500.

It makes more sense just to let bandwidth mean bandwidth.  If we want
to have measured bandwidth count for more than reported bandwidth,
let's have an optional flag on the vote line that looks like:

   w Bandwidth=X Measured=1

This way the median actually -is- the median.  See below for my
suggested voting algorithm.

> 3. Security implications
> 
>   If only some authorities choose to vote on an offset, then a majority of
>   those voting authorities can arbitrarily change the bandwidth weighting
>   for the relay. At the extreme, if there's only one offset-voting
>   authority, then that authority can dictate which relays clients will
>   find attractive.
> 
>   This problem isn't entirely new: we already have the worry wrt
>   the subset of authorities that vote for BadExit.
> 
>   To make it not so bad, we should deploy at least three offset-voting
>   authorities.
> 
>   Also, authorities that know how to vote for offsets should vote for
>   an offset of zero for new nodes, rather than choosing not to vote on
>   any offset in those cases.

Suggested revision: If there are at least MIN_MEASURING (say, 3)
voters that say "measured" about a given banwidth observation, then we
take the median of their observations as the bandwidth of the router.
Otherwise, we proceed as in the current dir-spec.txt for that router.

> 4. Design
> 
>   First, we need a new consensus method to support this new calculation.
> 
>   Now v3 votes can have a new weight on the "w" line:
>     "Bandwidth_Offset=" INT.
>   Once we're using the new consensus method, the new way to compute the
>   Bandwidth weight is by taking the old vote (explained in proposal 141:
>   median, then choose the lower number in the case of ties), and adding
>   or subtracting the median offset (using the offset closer to 0 in the
>   case of ties, and with a sum of 0 if the sum is negative).
> 
>   Then the actual consensus looks just the same as it did before,
>   so clients never have to know that this additional calculation is
>   happening.

Here are some additional suggestions that came up as we were talking.

* We'd like to avoid having little changes in measured bandwidth
  result in changes to the consensus, since we'd like to be able to
  transfer consensus diffs.  Thus, let's round our votes to the
  first N significant bits.

  In other words, if we've observed a bandwidth of 28789 bytes for a
  node, that's  111 0000 0111 0101.   We round that down to 111 0000
  0000 0000, and declare 26872.

  This is better than rounding to the nearest 1k, since a 1k change is
  very significant for low values, and relatively frequent for high
  values.

* We want to have our declared values for node bandwidth lag our
  observations by a little.  If we observe that a node has a very high
  bandwidth, we don't want to declare that high bandwidth immediately,
  since a node with apparent high available bandwidth could have
  either a very high capacity, or a very unused capacity.  Instead, we
  want to adjust the bandwidth upwards more slowly, so that we don't
  oscillate between "notice that performance is good and declare the
  node as high-capacity, and get it hammered by requests; notice that
  performance is bad and declare the node as low-capacity and have
  users stop using it".

  (Mike came up with the idea for this one; I don't know if it's made
  it to or-dev yet.)

  My typical way for doing this would be for voters who are measuring
  bandwidth to declare the bandwidth of a node as a weighted average
  of the observation and the last declared bandwidth:

         Bw_new = (Bw_old * Alpha + Observation)/(Alpha + 1)

  Roger noted that with an approach like this, Alpha could be voted on
  like any other value, letting us adjust the lag parameters.

* It's really important that measurers try to normalize declared
  bandwidth values to the same scale.  Conceptually, a bandwidth as we
  use it in the consensus is just a weight on an arbitrary scale.  But
  if different authorities are voting for bandwidths using different
  scales -- for example, if one systematically underestimates by 10%
  -- then the one with the middle _scale_ will pretty much
  unilaterally set the weights for the routers.

  I don't have a great solution for this one, other than to try to
  make sure the bandwidth measurement algorithms don't have any
  systemic bias depending on which authority is running them from
  where on the Internet.  We _could_ normalize everything to a
  fraction, but that doesn't seem like it would give stable values,
  and combining these values would seem hard.

-- 
Nick