[tor-dev] Update to Proposal 316: FlashFlow

Thu Oct 8 18:34:35 UTC 2020

Hi!  We had a meeting about FlashFlow today, including several of the
authors.  Here are the notes we wound up with for ideas and next
straps.

Easy changes:
    * Just use a PRNG; assume we can make them arbitrarily fast.
(example candidates: chacha8, shake128.)
    * Use relay identities as the identifiers for measurers, so that
we won't need a novel authentication scheme.
    * We can't call the list of measurer IDs a "network parameter",
since technically speaking network parameters have to be integers.  It
will have to be a different part of the consensus header.
    * Make sure that all of the declared ranges for network parameters
are as wide as they could possibly be; making these parameters take a
wider range is hard to change later.

Trickier but straightforward:
    * Describe how to avoid collisions with multiple coordinators
       - idea: exactly how it's specified in the paper ;) but a
simplier idea ...
       - idea: coord 1 measurers on day 1, c2 on d2, ... etc. for all
coords, then repeat
    * Describe how to aggregate all background measurements over the
full 30 seconds, and how to use that data.  (This may lower accuracy a
little, but makes some kinds of the analysis harder.) Idea: relay
reports *once* at end of measurement the total amount of bg traffic
and the coord simply divides that by the length of the measurement to
have a per-second average.
    * Mention whether relays should reserve sockets in case they get measured

More thinking may be needed:
    *  Summarize ideas for how multiple coordinators don't have to
share full schedules with one another. Possibly divide up the network
by days? [e.g., Coordinator 1 measures nodes in set X on Monday]
    * Would it work if we declare a maximum measurement fraction (eg
75% of bandwidth) but measurers only use that fraction in a few
measurements once in a while, and mostly they do less (eg 10% of
bandwidth).
    * Discuss migration: how do we use this data when not all relays
support being measured in this way?

In terms of implementation:
- identify the python parts that are different to sbws, create sbws
subpackages "ff measurer" and "ff coordinator" and add a config option
to run in 1 mode or other, to do not have yet another code base to
maintain

In terms of deployment:
- we currently don't have any automatic way to ensure net is still
"working properly", only some mostly-manual ways and some one-time
experiments. This has caused some relay operators to do not be happy
and some quite time to figure out the problem and solve it

In terms of coordination:
We're deploying sbws only 1 dirauth at a time and trying to ensure net
is still "working properly".
If we deploy ff, before we have deployed sbws in all bwauths and
ensure net is still "working properly", will be hard to see what is an
sbws bug or ff one or both

FlashFlow, the python code for coordinator, measurer, etc.
https://gitlab.torproject.org/pastly/flashflow

The rendered documentation for/from the above https://flashflow.pastly.xyz/

Tor repo with branch https://gitlab.torproject.org/pastly/tor/-/tree/ff-v2

The ticket with the concerning graphs attributable to "Rob's speedtest thing"
https://trac.torproject.org/projects/tor/ticket/33076