[tor-dev] Next version of the algorithm
Ola Bini
obini at thoughtworks.com
Thu Mar 3 15:44:47 UTC 2016
Great, lots of good comments. I'll respond in depth once the fever has
left my brain! =D
On Thu, Mar 03, 2016 at 03:15:26PM +0100, George Kadianakis wrote:
> Ola Bini <obini at thoughtworks.com> writes:
>
> > Hi,
>
> Here are some more comments to the latest version of the proposal, as seen here:
> https://github.com/twstrike/torspec
>
> > Filename: 259-guard-selection.txt
> > Title: New Guard Selection Behaviour
> > Author: Isis Lovecruft, George Kadianakis, [Ola Bini]
> > Created: 2015-10-28
> > Status: Draft
> >
> > §1. Overview
> >
> > Tor uses entry guards to prevent an attacker who controls some
> > fraction of the network from observing a fraction of every user's
> > traffic. If users chose their entries and exits uniformly at
> > random from the list of servers every time they build a circuit,
> > then an adversary who had (k/N) of the network would deanonymize
> > F=(k/N)^2 of all circuits... and after a given user had built C
> > circuits, the attacker would see them at least once with
> > probability 1-(1-F)^C. With large C, the attacker would get a
> > sample of every user's traffic with probability 1.
> >
> > To prevent this from happening, Tor clients choose a small number of
> > guard nodes (currently 3). These guard nodes are the only nodes
> > that the client will connect to directly. If they are not
> > compromised, the user's paths are not compromised.
> >
> > But attacks remain. Consider an attacker who can run a firewall
> > between a target user and the Tor network, and make
> > many of the guards they don't control appear to be unreachable.
> > Or consider an attacker who can identify a user's guards, and mount
> > denial-of-service attacks on them until the user picks a guard
> > that the attacker controls.
> >
> > In the presence of these attacks, we can't continue to connect to
> > the Tor network unconditionally. Doing so would eventually result
> > in the user choosing a hostile node as their guard, and losing
> > anonymity.
> >
> > This proposal outlines a new entry guard selection algorithm, which
> > addresses the following concerns:
> >
> > - Heuristics and algorithms for determining how and which guard(s)
> > is(/are) chosen should be kept as simple and easy to understand
> > as possible.
> >
> > - Clients in censored regions or who are behind a fascist firewall
> > who connect to the Tor network should not experience any
> > significant disadvantage in terms of reachability or usability.
> >
> > - Tor should make a best attempt at discovering the most
> > appropriate behaviour, with as little user input and
> > configuration as possible.
> >
> >
> > §2. Design
> >
> > Alice, an OP attempting to connect to the Tor network, should
> > undertake the following steps to determine information about the
> > local network and to select (some) appropriate entry guards. In the
> > following scenario, it is assumed that Alice has already obtained a
> > recent, valid, and verifiable consensus document.
> >
> > The algorithm is divided into four components such that the full
> > algorithm is implemented by first invoking START, then repeatedly
> > calling NEXT while adviced it SHOULD_CONTINUE and finally calling
> > END. For an example usage see §A. Appendix.
> >
> > Several components of NEXT can be invoked asynchronously. SHOULD_CONTINUE
> > is used for the algorithm to be able to tell the caller whether we
> > consider the work done or not - this can be used to retry primary
> > guards when we finally are able to connect to a guard after a long
> > network outage, for example.
> >
> > This algorithm keeps track of the unreachability status for guards
> > in state private to the algorithm - this is re-initialized every time
> > START is called.
> >
>
> Hmm, didn't we decide to persist the unreachability status over runs, right?
> Or not?
>
> The argument for doing so is "What happens if I'm a hidden service that needs to
> make 5 circuits per second, and my first 2 guards happen to be offline? Do I
> have to spend time probing them everytime I make a new circuit?".
>
> > The algorithm expects several arguments to guide its behavior. These
> > will be defined in §2.1.
> >
> > The goal of this algorithm is to strongly prefer connecting to the
> > same guards we have connected to before, while also trying to detect
> > conditions such as a network outage or a network environment that
> > blocks most ports. The way it does this is by keeping track of how
> > many guards we have exposed ourselves to, and if we have connected
> > to too many we will fall back to only retrying the ones we have
> > already tried. The algorithm also decides on sample set that should
> > be persisted - in order to minimize the risk of an attacker forcing
> > enumeration of the whole network by triggering rebuilding of
> > circuits.
> >
> >
> > §2.1. The START algorithm
> >
> > In order to start choosing an entry guard, use the START
> > algorithm. This takes four arguments that can be used to fine tune
> > the workings:
> >
> > USED_GUARDS
> > This is a list that contains all the guards that have been used
> > before by this client. We will prioritize using guards from this
> > list in order to minimize our exposure. The list is expected to
> > be sorted based on priority, where the first entry will have the
> > highest priority.
> >
> > SAMPLED_UTOPIC_GUARDS
> > This is a set that contains all guards that should be considered
> > for connection under utopic conditions. This set should be
> > persisted between runs. It will be filled in by the algorithm if
> > it's empty, or if it contains less than SAMPLE_SET_THRESHOLD
> > guards after winnowing out older guards. It should be filled by
> > using NEXT_BY_BANDWIDTH with UTOPIC_GUARDS as an argument.
> >
>
> Should we use UTOPIC_GUARDS or REMAINING_UTOPIC_GUARDS as the argument?
>
> > SAMPLED_DYSTOPIC_GUARDS
> > This is a set that contains all guards that should be considered
> > for connection under dystopic conditions. This set should be
> > persisted between runs. It will be filled in by the algorithm if
> > it's empty, or if it contains less than SAMPLE_SET_THRESHOLD
> > guards after winnowing out older guards. It should be filled by
> > using NEXT_BY_BANDWIDTH with DYSTOPIC_GUARDS as an argument.
> >
> > EXCLUDE_NODES
> > A set of nodes that we should not consider using as a guard.
> >
> > N_PRIMARY_GUARDS
> > The number of guards we should consider our primary
> > guards. These guards will be retried more frequently and will
> > take precedence in most situations. By default the primary
> > guards will be the first N_PRIMARY_GUARDS guards from USED_GUARDS.
> >
> > DIR
> > If this argument is set, we should only consider guards that can
> > be directory guards. If not set, we will consider all guards.
> >
> > The primary work of START is to initialize the state machine depicted
> > in §2.2 The NEXT algorithm.
> > The initial state of the machine is defined by:
> >
> > GUARDS
> > This is a set of all guards from the consensus, without
> > EXCLUDE_NODES and potentially filtered if DIR is set.
> >
> > UTOPIC_GUARDS
> > This is a set of all guards to use under utopic conditions. It
> > will primarily be used to fill in SAMPLED_UTOPIC_GUARDS. This
> > set will be initialized to be the same as GUARDS.
> >
> > DYSTOPIC_GUARDS
> > This is a set of all guards to use under dystopic conditions
> > (usually when we are subject to a firewall that restricts the
> > ports we can connect to). It will primarily be used to fill in
> > SAMPLED_UTOPIC_GUARDS. This set will be initialized to be the
>
> I guess you mean SAMPLED_DYSTOPIC_GUARDS.
>
> > subset of GUARDS that listen to ports that are allowed by
> > dystopic conditions.
> >
> > REMAINING_UTOPIC_GUARDS
> > This is a running set of the utopic guards we have not yet tried
> > to connect to. It should be initialized to be SAMPLED_UTOPIC_GUARDS
> > without USED_GUARDS.
> >
>
> Maybe here we should also mention that we will reinsert guards that we have not
> tried in a long time (GUARDS_RETRY_TIME) as specified by 2.2.2?
>
> > REMAINING_DYSTOPIC_GUARDS
> > This is a running set of the dystopic guards we have not yet tried
> > to connect to. It should be initialized to be SAMPLED_DYSTOPIC_GUARDS
> > without USED_GUARDS.
> >
> > TRIED_GUARDS
> > This set keeps track of all utopic guards we have tried connecting
> > to. This should be initialized to the empty set.
> >
> > TRIED_DYSTOPIC_GUARDS
> > This set keeps track of all dystopic guards we have tried connecting
> > to. This should be initialized to the empty set.
> >
> > STATE
> > A variable that keeps track of which state in the state
> > machine we are currently in. It should be initialized to
> > STATE_PRIMARY_GUARDS.
> >
> > PRIMARY_GUARDS
> > This list keeps track of our primary guards. These are guards
> > that we will prioritize when trying to connect, and will also
> > retry more often in case of failure with other guards.
> > It should be initialized by calling algorithm
> > NEXT_PRIMARY_GUARD repeatedly until PRIMARY_GUARDS contains
> > N_PRIMARY_GUARDS elements.
> >
> >
> > §2.2. The NEXT algorithm
> >
> > The NEXT algorithm is composed of several different possibly flows. The
> > first one is a simple state machine that can transfer between four
> > different states. Every time NEXT is invoked, it will resume at the
> > state where it left off previously. In the course of selecting an
> > entry guard, a new consensus can arrive. When that happens we need
> > to update the data structures used, but nothing else should change.
> >
> > Before jumping in to the state machine, we should first check if it
> > was at least PRIMARY_GUARDS_RETRY_INTERVAL minutes since we tried
> > any of the PRIMARY_GUARDS. If this is the case, and we are not in
> > STATE_PRIMARY_GUARDS, we should save the previous state and set the
> > state to STATE_PRIMARY_GUARDS.
> >
> >
> > §2.2.1. The STATE_PRIMARY_GUARDS state
> >
> > Return each entry in PRIMARY_GUARDS in turn. For each entry, if it
> > was not possible to connect to it, mark the entry as unreachable,
> > and add it to TRIED_GUARDS.
> > [XXX defining "was not possible to connect" as "entry is not live" according
> > to current definition of "live entry guard" in tor source code, seems
> > to improve success rate on the flaky network scenario.
> > See: https://github.com/twstrike/tor_guardsim/issues/1#issuecomment-187374942]
> >
>
> Hmm, I'm not sure what this XXX means exactly. I believe we should actually try
> to _connect_ to those primary guards and not just check if we think they are live.
>
> > If all entries have been tried, restore the previous state and go
> > there. If there is no previous state, transition to STATE_TRY_UTOPIC.
> >
> >
> > §2.2.2. The STATE_TRY_UTOPIC state
> >
> > In order to give guards that have been marked as unreachable a
> > chance to come back, add all entries in TRIED_GUARDS that were
> > marked as unreachable more than GUARDS_RETRY_TIME minutes ago back
> > to REMAINING_UTOPIC_GUARDS.
> >
>
> I'm a bit puzzled by this mechanism. Maybe it's benefits can be explained a bit
> more clearly?
>
> When we add guards back to REMAINING_UTOPIC_GUARDS, do we also remove them from
> TRIED_GUARDS?
>
> Also, the value here is currently 20 minutes. So this will trigger only when
> this algorithm takes over 20 minutes to terminate. I guess this should only
> happen when the network is down.
>
> Now that we have persistent SAMPLED_UTOPIC_GUARDS is this still useful? Won't
> we have fully populated our SAMPLED_*_GUARDS structures by the point this rule
> triggers?
>
> Sorry for the confusion :)
>
> > Return each entry in USED_GUARDS that is not in PRIMARY_GUARDS in
> > turn. For each entry, if it was not possible to connect to it, mark
> > the entry as unreachable and add it to TRIED_GUARDS.
> >
> > Return each entry from REMAINING_UTOPIC_GUARDS using
> > NEXT_BY_BANDWIDTH. For each entry, if it was not possible to connect
> > to it, remove the entry from REMAINING_UTOPIC_GUARDS, mark it as
> > unreachable and add it to TRIED_GUARDS.
> >
> > If no entries remain in REMAINING_UTOPIC_GUARDS, transition to
> > STATE_TRY_DYSTOPIC.
> >
> >
> > §2.2.3. The STATE_TRY_DYSTOPIC state
> >
> > In order to give guards that have been marked as unreachable a
> > chance to come back, add all entries in TRIED_DYSTOPIC_GUARDS that were
> > marked as unreachable more than GUARDS_RETRY_TIME minutes ago back
> > to REMAINING_DYSTOPIC_GUARDS.
> >
> > Return each dystopic entry in USED_GUARDS that is not in PRIMARY_GUARDS in
> > turn. For each entry, if it was not possible to connect to it, mark
> > the entry as unreachable and add it to TRIED_DYSTOPIC_GUARDS.
> >
> > Return each entry from REMAINING_DYSTOPIC_GUARDS using
> > NEXT_BY_BANDWIDTH. For each entry, if it was not possible to connect
> > to it, remove the entry from REMAINING_DYSTOPIC_GUARDS, mark it as
> > unreachable and add it to TRIED_DYSTOPIC_GUARDS.
> >
> > If no entries remain in REMAINING_DYSTOPIC_GUARDS, transition to
> > STATE_PRIMARY_GUARDS.
> >
> >
> > §2.2.5. ON_NEW_CONSENSUS
> >
> > First, ensure that all guard profiles are updated with information
> > about whether they were in the newest consensus or not. If not, the
> > guard is considered bad.
> >
>
> Maybe instead of "If not" we could say "If a guard is not included in the newest
> consensus" to make it a bit clearer.
>
> > If any PRIMARY_GUARDS have become bad, remove the guard from
> > PRIMARY_GUARDS. Then ensure that PRIMARY_GUARDS contain
> > N_PRIMARY_GUARDS entries by repeatedly calling NEXT_PRIMARY_GUARD.
> >
> > If any guards in USED_GUARDS have switched from being bad to being
> > non-bad, add it back in the place it should have been in
> > PRIMARY_GUARDS if it had been non-bad when populating
> > PRIMARY_GUARDS. If this results in PRIMARY_GUARDS being larger than
> > N_PRIMARY_GUARDS, truncate the list to be N_PRIMARY_GUARDS entries
> > long.
> > [XXX Does "add it back in the place it should have been in PRIMARY_GUARDS
> > if it had been non-bad" implies keeping original order?]
> >
>
> If I understand correctly, I think the answer to this XXX is "Ideally, yes.".
>
> I'm curious to see how this mechanism will be implemented because it's important
> and it would be nice if it's done cleanly.
>
> Also, we should be careful about when we count 'bad' guards. After a few weeks
> of operation, the USED_GUARDS list can accumulate multiple bad guards, and we
> should make sure we don't count them when we do our threshold checks.
>
> >
> > §2.3. The SHOULD_CONTINUE algorithm
> >
> > This algorithm takes as an argument a boolean indicating whether the
> > circuit was successfully built or not.
> >
> > After the caller have tried to build a circuit with a returned
> > guard, they should invoke SHOULD_CONTINUE to understand if the
> > algorithm is finished or not. SHOULD_CONTINUE will always return
> > true if the circuit failed. If the circuit succeeded,
> > SHOULD_CONTINUE will always return false, unless the guard that
> > succeeded was the first guard to succeed after
> > INTERNET_LIKELY_DOWN_INTERVAL minutes - in that case it will set the
> > state to STATE_PRIMARY_GUARDS and return true.
> >
>
> ACK.
>
> Just a reminder that we also discussed adding the "Retry primary guards if we
> have looped over the whole guardlist" heuristic somewhere here. Because in many
> cases the network can go down and then back up in less than a minute.
>
> >
> > §2.4. The END algorithm
> >
> > The goal of this algorithm is simply to make sure that we keep track
> > of successful connections made. This algorithm should be invoked
> > with the guard that was used to correctly set up a circuit.
> >
> > Once invoked, this algorithm will mark the guard as used, and make
> > sure it is in USED_GUARDS.
> > [XXX if the guard is not in USED_GUARDS should be added *first*?
> > Important because it will affect on the building of PRIMARY_GUARDS.]
> >
>
> IIUC, if the guard is not in USED_GUARDS it should be added *last* (that is,
> with lowest priority).
>
> >
> > §2.5. Helper algorithms
> >
> > These algorithms are used in the above algorithms, but have been
> > separated out here in order to make the flow clearer.
> >
> > NEXT_PRIMARY_GUARD
> > - Return the first entry from USED_GUARDS that is not in
> > PRIMARY_GUARDS and that is in the most recent consensus.
> > - If USED_GUARDS is empty, use NEXT_BY_BANDWIDTH with
> > REMAINING_UTOPIC_GUARDS as the argument.
> >
> > NEXT_BY_BANDWIDTH
> > - Takes G as an argument, which should be a set of guards to
> > choose from.
> > - Return a randomly select element from G, weighted by bandwidth.
> >
> >
> > §3. Consensus Parameters, & Configurable Variables
> >
> > This proposal introduces several new parameters that ideally should
> > be set in the consensus but that should also be possible to
> > set or override in the client configuration file. Some of these have
> > proposed values, but for others more simulation and trial needs to
> > happen.
> >
> > PRIMARY_GUARDS_RETRY_INTERVAL
> > In order to make it more likely we connect to a primary guard,
> > we would like to retry the primary guards more often than other
> > types of guards. This parameter controls how many minutes should
> > pass before we consider retrying primary guards again. The
> > proposed value is 3.
> >
> > GUARDS_RETRY_TIME
> > In the normal course of selecting guards, we want to continue
> > retrying unreachable guards in case they have become
> > reachable. This parameter controls the minimum amount of minutes
> > before we should start to consider guards for connecting
> > again. Proposed value is 20.
> >
> > SAMPLE_SET_THRESHOLD
> > In order to allow us to recognize dystopic situations or a
> > completely unreachable network, we would like to avoid
> > connecting to too many guards before switching modes. We also
> > want to avoid exposing ourselves to too many nodes in a
> > potentially hostile situation. This parameter, expressed as a
> > fraction, determines the number of guards we should keep as the
> > sampled set of the only guards we will consider connecting
> > to. It will be used as a fraction for both the utopic and the
> > dystopic sampled set. If we assume there are 1900 utopic guards
> > and of them there are 500 dystopic guards, a setting of 0.02
> > means we will have a sample set of 38 utopic guards and 10
> > dystopic guards. This limits our total exposure. Proposed value
> > is 0.02.
> >
>
> We should decide if we want to actually use a dynamic percentage here, or just
> set the threshold to a constant value.
>
> A dynamic percentage might give us better security and reachability as the
> network evolves, but might also cause unpredictable behaviors if we suddently
> get too many guards or too many of them disappear.
>
> I don't have a strong opinion here.
>
> > INTERNET_LIKELY_DOWN_INTERVAL
> > The number of minutes since we started trying to find an entry
> > guard before we should consider the network down and consider
> > retrying primary guards before using a functioning guard
> > found. Proposed value 20.
> >
>
> It seems to me that the value 20 here could get reduced to something like 5 or
> even less. Of course 5 is also an arbitrary value and to actually find out the
> "best" number here we should test the algorithm ourselves in various network
> types.
>
> Let's say we are restricting ourselves to SAMPLE_SET_THRESHOLD guards. If the
> network is down, we will start cycling through guards and we will have walked
> through all of them in a matter of minutes. After we have exhausted our guard
> list once and we did not manage to connect, our network is effectively
> down. This might give extra reasons to do the "Retry guards if we have exhausted
> our guard list once" heuristic.
>
> > §4. Security properties and behavior under various conditions
> >
> > Under normal conditions, this algorithm will allow us to quickly
> > connect and use guards we have used before with high likelihood of
> > working. Assuming the first primary guard is reachable and in the
> > consensus, this algorithm will deterministically always return that
> > guard.
> >
> > Under dystopic conditions (when a firewall is in place that blocks
> > all ports except for potentially port 80 and 443), this algorithm
> > will try to connect to 2% of all guards before switching modes to try
> > dystopic guards. Currently, that means trying to connect to circa 40
> > guards before getting a successful connection. If we assume a
> > connection try will take maximum 10 seconds, that means it will take
> > up to 6 minutes to get a working connection.
> >
> > When the network is completely down, we will try to connect to 2% of
> > all guards plus 2% of all dystopic guards before realizing we are
> > down. This means circa 50 guards tried assuming there are 1900 guards
> > in the network.
> >
> > In terms of exposure, we will connect to a maximum of 2% of all
> > guards plus 2% of all dystopic guards, or 3% of all guards,
> > whichever is lower. If N is the number of guards, and k is the
> > number of guards an attacker controls, that means an attacker would
> > have a probability of 1-(1-(k/N)^2)^(N * 0.03) to have one of their
> > guards selected before we fall back. In real terms, this means an
> > attacker would need to control over 10% of all guards in order to
> > have a larger than 50% chance of controlling a guard for any given client.
> >
> > In addition, since the sampled set changes slowly (the suggestion
> > here is that guards in it expire every month) it is not possible for
> > an attacker to force a connection to an entry guard that isn't
> > already in the users sampled set.
> >
> >
> > §A. Appendix: An example usage
> >
> > In order to clarify how this algorithm is supposed to be used, this
> > pseudo code illustrates the building of a circuit:
> >
> > OPEN_CIRCUIT:
> > context = ALGO_CHOOSE_ENTRY_GUARD_START(used_guards, [], 3, false)
> >
> > while True:
> > entryGuard = ALGO_CHOOSE_ENTRY_GUARD_NEXT(context)
> > circuit = composeCircuitAndConnect(entryGuard)
> >
> > if not SHOULD_CONTINUE(isSuccessful(circuit)):
> > ALGO_CHOOSE_ENTRY_GUARD_END(context, entryGuard)
> > return circuit
> >
> > -*- coding: utf-8 -*-
>
--
Ola Bini (https://olabini.se)
"Yields falsehood when quined" yields falsehood when quined.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 931 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20160303/92c5229e/attachment-0001.sig>
More information about the tor-dev
mailing list