[tor-dev] Proposal 259: New Guard Selection Behaviour
George Kadianakis
desnacked at riseup.net
Thu Mar 24 11:55:52 UTC 2016
Hello,
proposal 259 has evolved plenty since the last time it was posted on
[tor-dev]. All the technical discussion can be found in various threads, and
here is a thread about the proposal itself as it is now.
The current proposal is pretty much done, and it's currently being implemented
for testing. We imagine that during testing we might have to slightly alter the
algorithm and its parameters to improve performance and security.
Here are some behaviors that are still unspecified in the current proposal
based on our discussion yesterday:
- What exactly should be done if a circuit has restrictred requirements?
(e.g. it needs a Fast/Stable guard, but our top guard is not Fast/Stable)
- How exactly should directory guards work? Should the guard lists be
initialized only with directory guards? Or should we just initialize our
guard lists from the set of all guards, and just skip non-V2Dir guards
whenever we need to make a directory circuit? FWIW, there are currently 2076
guards, out of which 1659 support V2Dir (i.e. they are directory guards).
- How should the algorithm work wrt ReachableAddresses? I wonder how the
current algorithm work wrt ReachableAddresses and if it's behavior is good.
---
Filename: 259-guard-selection.txt
Title: New Guard Selection Behaviour
Author: Isis Lovecruft, George Kadianakis, [Ola Bini]
Created: 2015-10-28
Status: Draft
§1. Overview
Tor uses entry guards to prevent an attacker who controls some
fraction of the network from observing a fraction of every user's
traffic. If users chose their entries and exits uniformly at
random from the list of servers every time they build a circuit,
then an adversary who had (k/N) of the network would deanonymize
F=(k/N)^2 of all circuits... and after a given user had built C
circuits, the attacker would see them at least once with
probability 1-(1-F)^C. With large C, the attacker would get a
sample of every user's traffic with probability 1.
To prevent this from happening, Tor clients choose a small number of
guard nodes (currently 3). These guard nodes are the only nodes
that the client will connect to directly. If they are not
compromised, the user's paths are not compromised.
But attacks remain. Consider an attacker who can run a firewall
between a target user and the Tor network, and make
many of the guards they don't control appear to be unreachable.
Or consider an attacker who can identify a user's guards, and mount
denial-of-service attacks on them until the user picks a guard
that the attacker controls.
In the presence of these attacks, we can't continue to connect to
the Tor network unconditionally. Doing so would eventually result
in the user choosing a hostile node as their guard, and losing
anonymity.
This proposal outlines a new entry guard selection algorithm, which
addresses the following concerns:
- Heuristics and algorithms for determining how and which guard(s)
is(/are) chosen should be kept as simple and easy to understand
as possible.
- Clients in censored regions or who are behind a fascist firewall
who connect to the Tor network should not experience any
significant disadvantage in terms of reachability or usability.
- Tor should make a best attempt at discovering the most
appropriate behaviour, with as little user input and
configuration as possible.
§2. Design
Alice, an OP attempting to connect to the Tor network, should
undertake the following steps to determine information about the
local network and to select (some) appropriate entry guards. In the
following scenario, it is assumed that Alice has already obtained a
recent, valid, and verifiable consensus document.
The algorithm is divided into four components such that the full
algorithm is implemented by first invoking START, then repeatedly
calling NEXT while adviced it SHOULD_CONTINUE and finally calling
END. For an example usage see §A. Appendix.
Several components of NEXT can be invoked asynchronously. SHOULD_CONTINUE
is used for the algorithm to be able to tell the caller whether we
consider the work done or not - this can be used to retry primary
guards when we finally are able to connect to a guard after a long
network outage, for example.
This algorithm keeps track of the unreachability status for guards
in state global to the system, so that repeated runs will not have
to rediscover unreachability over and over again. However, this
state does not need to be persisted permanently - it is purely an
optimization.
The algorithm expects several arguments to guide its behavior. These
will be defined in §2.1.
The goal of this algorithm is to strongly prefer connecting to the
same guards we have connected to before, while also trying to detect
conditions such as a network outage or a network environment that
blocks most ports. The way it does this is by keeping track of how
many guards we have exposed ourselves to, and if we have connected
to too many we will fall back to only retrying the ones we have
already tried. The algorithm also decides on sample set that should
be persisted - in order to minimize the risk of an attacker forcing
enumeration of the whole network by triggering rebuilding of
circuits.
§2.1. The START algorithm
In order to start choosing an entry guard, use the START
algorithm. This takes four arguments that can be used to fine tune
the workings:
USED_GUARDS
This is a list that contains all the guards that have been used
before by this client. We will prioritize using guards from this
list in order to minimize our exposure. The list is expected to
be sorted based on priority, where the first entry will have the
highest priority.
SAMPLED_UTOPIC_GUARDS
This is a set that contains all guards that should be considered
for connection under utopic conditions. This set should be
persisted between runs. It will be filled in by the algorithm if
it's empty, or if it contains less than SAMPLE_SET_THRESHOLD
guards after winnowing out older guards. It should be filled by
using NEXT_BY_BANDWIDTH with UTOPIC_GUARDS as an argument.
SAMPLED_DYSTOPIC_GUARDS
This is a set that contains all guards that should be considered
for connection under dystopic conditions. This set should be
persisted between runs. It will be filled in by the algorithm if
it's empty, or if it contains less than SAMPLE_SET_THRESHOLD
guards after winnowing out older guards. It should be filled by
using NEXT_BY_BANDWIDTH with DYSTOPIC_GUARDS as an argument.
EXCLUDE_NODES
A set of nodes that we should not consider using as a guard.
N_PRIMARY_GUARDS
The number of guards we should consider our primary
guards. These guards will be retried more frequently and will
take precedence in most situations. By default the primary
guards will be the first N_PRIMARY_GUARDS guards from USED_GUARDS.
DIR
If this argument is set, we should only consider guards that can
be directory guards. If not set, we will consider all guards.
The primary work of START is to initialize the state machine depicted
in §2.2. The initial state of the machine is defined by:
GUARDS
This is a set of all guards from the consensus, without
EXCLUDE_NODES and potentially filtered if DIR is set.
UTOPIC_GUARDS
This is a set of all guards to use under utopic conditions. It
will primarily be used to fill in SAMPLED_UTOPIC_GUARDS. This
set will be initialized to be the same as GUARDS.
DYSTOPIC_GUARDS
This is a set of all guards to use under dystopic conditions
(usually when we are subject to a firewall that restricts the
ports we can connect to). It will primarily be used to fill in
SAMPLED_DYSTOPIC_GUARDS. This set will be initialized to be the
subset of GUARDS that listen to ports that are allowed by
dystopic conditions.
REMAINING_UTOPIC_GUARDS
This is a running set of the utopic guards we have not yet tried
to connect to. It should be initialized to be
SAMPLED_UTOPIC_GUARDS without USED_GUARDS.
REMAINING_DYSTOPIC_GUARDS
This is a running set of the dystopic guards we have not yet
tried to connect to. It should be initialized to be
SAMPLED_DYSTOPIC_GUARDS without USED_GUARDS.
STATE
A variable that keeps track of which state in the state
machine we are currently in. It should be initialized to
STATE_PRIMARY_GUARDS.
PRIMARY_GUARDS
This list keeps track of our primary guards. These are guards
that we will prioritize when trying to connect, and will also
retry more often in case of failure with other guards.
It should be initialized by calling algorithm
NEXT_PRIMARY_GUARD repeatedly until PRIMARY_GUARDS contains
N_PRIMARY_GUARDS elements.
§2.2. The NEXT algorithm
The NEXT algorithm is composed of several different possibly flows. The
first one is a simple state machine that can transfer between four
different states. Every time NEXT is invoked, it will resume at the
state where it left off previously. In the course of selecting an
entry guard, a new consensus can arrive. When that happens we need
to update the data structures used, but nothing else should change.
Before jumping in to the state machine, we should first check if it
was at least PRIMARY_GUARDS_RETRY_INTERVAL minutes since we tried
any of the PRIMARY_GUARDS. If this is the case, and we are not in
STATE_PRIMARY_GUARDS, we should save the previous state and set the
state to STATE_PRIMARY_GUARDS.
§2.2.1. The STATE_PRIMARY_GUARDS state
Return each entry in PRIMARY_GUARDS in turn. For each entry, if it
was not possible to connect to it and mark the entry as unreachable
[XXX defining "was not possible to connect" as "entry is not live" according
to current definition of "live entry guard" in tor source code, seems
to improve success rate on the flaky network scenario.
See: https://github.com/twstrike/tor_guardsim/issues/1#issuecomment-187374942]
If all entries have been tried, restore the previous state and go
there. If there is no previous state, transition to STATE_TRY_UTOPIC.
§2.2.2. The STATE_TRY_UTOPIC state
Return each entry in USED_GUARDS that is not in PRIMARY_GUARDS in
turn. For each entry, if it was not possible to connect to it and mark
the entry as unreachable.
Return each entry from REMAINING_UTOPIC_GUARDS using
NEXT_BY_BANDWIDTH. For each entry, if it was not possible to connect
to it, remove the entry from REMAINING_UTOPIC_GUARDS and mark it as
unreachable.
If no entries remain in REMAINING_UTOPIC_GUARDS, transition to
STATE_TRY_DYSTOPIC.
§2.2.3. The STATE_TRY_DYSTOPIC state
Return each entry from REMAINING_DYSTOPIC_GUARDS using
NEXT_BY_BANDWIDTH. For each entry, if it was not possible to connect
to it, remove the entry from REMAINING_DYSTOPIC_GUARDS and mark it as
unreachable.
If no entries remain in REMAINING_DYSTOPIC_GUARDS, transition to
STATE_PRIMARY_GUARDS.
§2.2.5. ON_NEW_CONSENSUS
First, ensure that all guard profiles are updated with information
about whether they were in the newest consensus or not. If a guard
is not included in the newest consensus, the guard is considered
bad.
If any PRIMARY_GUARDS have become bad, remove the guard from
PRIMARY_GUARDS. Then ensure that PRIMARY_GUARDS contain
N_PRIMARY_GUARDS entries by repeatedly calling NEXT_PRIMARY_GUARD.
If any guards in USED_GUARDS have switched from being bad to being
non-bad, add it back in the place it should have been in
PRIMARY_GUARDS if it had been non-bad when populating
PRIMARY_GUARDS. If this results in PRIMARY_GUARDS being larger than
N_PRIMARY_GUARDS, truncate the list to be N_PRIMARY_GUARDS entries
long.
§2.3. The SHOULD_CONTINUE algorithm
This algorithm takes as an argument a boolean indicating whether the
circuit was successfully built or not.
After the caller have tried to build a circuit with a returned
guard, they should invoke SHOULD_CONTINUE to understand if the
algorithm is finished or not. SHOULD_CONTINUE will always return
true if the circuit failed. If the circuit succeeded,
SHOULD_CONTINUE will always return false, unless the guard that
succeeded was the first guard to succeed after
INTERNET_LIKELY_DOWN_INTERVAL minutes - in that case it will set the
state to STATE_PRIMARY_GUARDS and return true.
§2.4. The END algorithm
The goal of this algorithm is simply to make sure that we keep track
of successful connections made. This algorithm should be invoked
with the guard that was used to correctly set up a circuit.
Once invoked, this algorithm will mark the guard as used, and make
sure it is in USED_GUARDS, by adding it at the end if it was not there.
§2.5. Helper algorithms
These algorithms are used in the above algorithms, but have been
separated out here in order to make the flow clearer.
NEXT_PRIMARY_GUARD
- Return the first entry from USED_GUARDS that is not in
PRIMARY_GUARDS and that is in the most recent consensus.
- If USED_GUARDS is empty, use NEXT_BY_BANDWIDTH with
REMAINING_UTOPIC_GUARDS as the argument.
NEXT_BY_BANDWIDTH
- Takes G as an argument, which should be a set of guards to
choose from.
- Return a randomly select element from G, weighted by bandwidth.
§3. Consensus Parameters, & Configurable Variables
This proposal introduces several new parameters that ideally should
be set in the consensus but that should also be possible to
set or override in the client configuration file. Some of these have
proposed values, but for others more simulation and trial needs to
happen.
PRIMARY_GUARDS_RETRY_INTERVAL
In order to make it more likely we connect to a primary guard,
we would like to retry the primary guards more often than other
types of guards. This parameter controls how many minutes should
pass before we consider retrying primary guards again. The
proposed value is 3.
SAMPLE_SET_THRESHOLD
In order to allow us to recognize dystopic situations or a
completely unreachable network, we would like to avoid
connecting to too many guards before switching modes. We also
want to avoid exposing ourselves to too many nodes in a
potentially hostile situation. This parameter, expressed as a
fraction, determines the number of guards we should keep as the
sampled set of the only guards we will consider connecting
to. It will be used as a fraction for both the utopic and the
dystopic sampled set. If we assume there are 1900 utopic guards
and of them there are 500 dystopic guards, a setting of 0.02
means we will have a sample set of 38 utopic guards and 10
dystopic guards. This limits our total exposure. Proposed value
is 0.02.
INTERNET_LIKELY_DOWN_INTERVAL
The number of minutes since we started trying to find an entry
guard before we should consider the network down and consider
retrying primary guards before using a functioning guard
found. Proposed value 5.
§4. Security properties and behavior under various conditions
Under normal conditions, this algorithm will allow us to quickly
connect and use guards we have used before with high likelihood of
working. Assuming the first primary guard is reachable and in the
consensus, this algorithm will deterministically always return that
guard.
Under dystopic conditions (when a firewall is in place that blocks
all ports except for potentially port 80 and 443), this algorithm
will try to connect to 2% of all guards before switching modes to try
dystopic guards. Currently, that means trying to connect to circa 40
guards before getting a successful connection. If we assume a
connection try will take maximum 10 seconds, that means it will take
up to 6 minutes to get a working connection.
When the network is completely down, we will try to connect to 2% of
all guards plus 2% of all dystopic guards before realizing we are
down. This means circa 50 guards tried assuming there are 1900 guards
in the network.
In terms of exposure, we will connect to a maximum of 2% of all
guards plus 2% of all dystopic guards, or 3% of all guards,
whichever is lower. If N is the number of guards, and k is the
number of guards an attacker controls, that means an attacker would
have a probability of 1-(1-(k/N)^2)^(N * 0.03) to have one of their
guards selected before we fall back. In real terms, this means an
attacker would need to control over 10% of all guards in order to
have a larger than 50% chance of controlling a guard for any given client.
In addition, since the sampled set changes slowly (the suggestion
here is that guards in it expire every month) it is not possible for
an attacker to force a connection to an entry guard that isn't
already in the users sampled set.
§A. Appendix: An example usage
In order to clarify how this algorithm is supposed to be used, this
pseudo code illustrates the building of a circuit:
OPEN_CIRCUIT:
context = ALGO_CHOOSE_ENTRY_GUARD_START(used_guards,
sampled_utopic_guards=[],
sampled_dystopic_guards=[],
exclude_nodes=[],
n_primary_guards=3,
dir=false,
guards_in_consensus)
while True:
entryGuard = ALGO_CHOOSE_ENTRY_GUARD_NEXT(context)
circuit = composeCircuitAndConnect(entryGuard)
if not SHOULD_CONTINUE(isSuccessful(circuit)):
ALGO_CHOOSE_ENTRY_GUARD_END(context, entryGuard)
return circuit
-*- coding: utf-8 -*-
More information about the tor-dev
mailing list