[tor-dev] Detailed algorithm

Mon Feb 8 20:26:52 UTC 2016

> Hey,
> 
> So - thinking through your comments and thoughts, I realized that the
> better way to proceed was to describe the algorithm, and then put it
> into a proposal format. The following is updated with most of your
> comments and should be a bit clearer. I decided to remove #241 again,
> since it's unclear how it actually fits together with #259. The idea
> is that the algo is divided into three pieces. If failure is reported
> from the algorithm, the caller can wait for a while or just restart it
> from scratch immediately:
> 

I feel that this algorithm is a step in the right direction! It's interface
logic is quite similar to how Tor picks its guards, and it's more specific than
the previous proposals. It's a good first step so I'll CC tor-dev so that Nick,
Isis and other people can take a look if they want.

After we pin this down, we also need to collect all its security properties in
one place. For example, we should state clearly what security properties the
thresholds GUARDS_TRY_THRESHOLD_TIME and GUARDS_FAILOVER_THRESHOLD provide.

I inline various questions below. I can take a deeper look tomorrow as well.

> - Start of algorithm
>   - Ensure USED_GUARDS is populated (from state file etc)
>   - Create a list of PRIMARY_GUARDS that contain N_PRIMARY_GUARDS that are not bad by:
>     - Taking the next entry from USED_GUARDS
>     - If USED_GUARDS is empty:
>       - randomly select an entry from UTOPIC_GUARDS, weighted by bandwidth

Hmm. PRIMARY_GUARDS stays like this for ever?

It probably needs to be updated when a primary guard get removed from the
consensus (become bad), and put the next entry of USED_GUARDS in its place.

Maybe we need another event (and algorithm) for "receive new consensus"?
Or is this considered out of scope?

>   - Set TRIED_GUARDS to be an empty set

Hm. What is the point of TRIED_GUARDS? It's like a list of guards that we found
unreachable at some point in time. It's never cleaned, so it will eventually
accumulate many nodes. IIUC, we only use that list to enforce our thresholds by
looking at timestamps.

Can we do without this list, just by checking the timestamps of unreachable
USED_GUARDS?  Or the logic will be much more complicated?

>   - Set TRIED_DYSTOPIC_GUARDS to be an empty set
>   - Set state = STATE_PRIMARY_GUARDS
> 
> - Each iteration of algorithm
>   - If it was at least 3 minutes since we tried the primary guards and we are not in STATE_PRIMARY_GUARDS:
>     - save previous state
>     - set state = STATE_PRIMARY_GUARDS
> 
>   - STATE_PRIMARY_GUARDS:
>     - return each entry in PRIMARY_GUARDS in turn
>       - mark each entry as "unreachable" if algorithm doesn't terminate

What is the "algorithm" here? Is it the guard selection algorithm, or the
circuit construction algorithm? I guess the latter?

If it's indeed the circuit construction algorithm, then the steps here are not
asynchronous anymore, which diverges from how Tor networking is designed. Maybe
this could be fixed by introducing some new states for when we are waiting for
a circuit to connect and we are now learning whether it was succesful or
not. Basically what entry_guard_register_connect_status() does in Tor right
now.

For example we could have STATE_WAITING_FOR_PRIMARY_GUARD_CIRCUIT for when we
are waiting for a primary guard circuit to connect. If it fails, then we need
to mark the entry as unreachable. Maybe we also need
STATE_WAITING_FOR_UTOPIC_GUARD_CIRCUIT etc.

>     - restore previous state (or STATE_TRY_UTOPIC if no previous)
>
> 
>   - STATE_TRY_UTOPIC:
>     - for each entry in TRIED_GUARDS that was marked as unreachable more than 20 minutes ago
>       - add to the front of remaining UTOPIC_GUARDS

Isn't it already in UTOPIC_GUARDS? I don't see us ever removing stuff from UTOPIC_GUARDS.

BTW, what is this step supposed to do? Setup previously TRIED_GUARDS to be retried?

>     - return each remaining entry from USED_GUARDS in turn
>       - for each entry, if algorithm doesn't terminate
>         - mark entry as "unreachable"
>         - add entry to TRIED_GUARDS

Add it there even if it it's already there (TRIED_GUARDS is never cleaned)?
Or maybe update a timestamp if it's already there? How to specify this nicely?

>         - if the number of entries in TRIED_GUARDS that were tried within GUARDS_TRY_THRESHOLD_TIME
>           is larger than GUARDS_TRY_THRESHOLD, return failure from the algorithm
>         - if the number of entries in TRIED_GUARDS is larger than a GUARDS_FAILOVER_THRESHOLD
>           proportion of UTOPIC_GUARDS, set state = STATE_TRY_DYSTOPIC
>     - return each remaining entry from UTOPIC_GUARDS randomly selected, weighted by bandwidth
>       - for each entry, if algorithm doesn't terminate
>         - mark entry as "unreachable"
>         - add entry to TRIED_GUARDS
>         - if the number of entries in TRIED_GUARDS that were tried within GUARDS_TRY_THRESHOLD_TIME
>           is larger than GUARDS_TRY_THRESHOLD, return failure from the algorithm
>         - if the number of entries in TRIED_GUARDS is larger than a GUARDS_FAILOVER_THRESHOLD
>           proportion of UTOPIC_GUARDS, set state = STATE_TRY_DYSTOPIC
> 

TRIED_GUARDS is never cleaned, so this will be almost always true after a while.

>   - STATE_TRY_DYSTOPIC:
>     - for each entry in TRIED_DYSTOPIC_GUARDS that was marked as unreachable more than 20 minutes ago
>       - add to the front of remaining UTOPIC_GUARDS

Does it matter if we add it to the front or the back of UTOPIC_GUARDS?
>From what I understand, we always sample from UTOPIC_GUARDS based on relay
bandwidth so the list order should not matter, right?

[This comment and also the comments below also apply for the STATE_TRY_UTOPIC case above]

>     - return each remaining DYSTOPIC entry from USED_GUARDS in turn

Do we even try previously-found unreachable guards here?

>       - for each entry, if algorithm doesn't terminate
>         - mark entry as "unreachable"
>         - add entry to TRIED_DYSTOPIC_GUARDS

Order matters here right? We probably want FIFO. 

>         - if the number of entries in TRIED_GUARDS+TRIED_DYSTOPIC_GUARDS that were tried within GUARDS_TRY_THRESHOLD_TIME
>           is larger than GUARDS_TRY_THRESHOLD, return failure from the algorithm
>         - if the number of entries in TRIED_DYSTOPIC_GUARDS is larger than a GUARDS_FAILOVER_THRESHOLD
>           proportion of DYSTOPIC_GUARDS:
>           - mark all guards in PRIMARY_GUARDS, TRIED_GUARDS and TRIED_DYSTOPIC_GUARDS as not "unreachable"
>           - return failure from the algorithm
>     - return each remaining entry from DYSTOPIC_GUARDS randomly selected, weighted by bandwidth
>       - for each entry, if algorithm doesn't terminate
>         - mark entry as "unreachable"
>         - add entry to TRIED_DYSTOPIC_GUARDS
>         - if the number of entries in TRIED_GUARDS+TRIED_DYSTOPIC_GUARDS that were tried within GUARDS_TRY_THRESHOLD_TIME
>           is larger than GUARDS_TRY_THRESHOLD, return failure from the algorithm
>         - if the number of entries in TRIED_DYSTOPIC_GUARDS is larger than a GUARDS_FAILOVER_THRESHOLD
>           proportion of DYSTOPIC_GUARDS:
>           - mark all guards in PRIMARY_GUARDS, TRIED_GUARDS and TRIED_DYSTOPIC_GUARDS as not "unreachable"

Hm, not sure what this is supposed to do.

Is it trying to setup the algorithm to retry all guards from the beginning?
Shouldn't we edit USED_+GUARDS in that case?

>           - return failure from the algorithm
> 

Instead of failing in this case, shouldn't we just retry from the top without
adding any new guards if we hit these thresholds? How do we do this? Or is this
logic considered out of scope? :x

> - End of algorithm
>   - If circuit is set up correctly, let algorithm know
>     - Algorithm marks the guard chosen as used and makes sure it is in USED_GUARDS
>   - Otherwise do another run of the algorithm
> 

When does this "End of algorithm" happen?