[tor-bugs] #12595 [Core Tor/Tor]: Finalize design for improved guard-node behavior
Tor Bug Tracker & Wiki
blackhole at torproject.org
Tue Jun 7 14:04:41 UTC 2016
#12595: Finalize design for improved guard-node behavior
-------------------------------------------------+-------------------------
Reporter: asn | Owner: asn
Type: defect | Status:
Priority: High | assigned
Component: Core Tor/Tor | Milestone: Tor:
Severity: Normal | 0.2.9.x-final
Keywords: tor-guard, TorCoreTeam201606, | Version: Tor:
028-triaged, mike-can, prop259, tor-guards- | 0.2.7
revamp | Resolution:
Parent ID: | Actual Points:
Reviewer: | Points: 3
| Sponsor:
| SponsorU-must
-------------------------------------------------+-------------------------
Comment (by asn):
Hello. Here is a small status report on this project.
Let me first mention the main problems of the current Tor guard algorithm
that proposal 259 tries to address:
'''ISSUE1''': The current Tor guard algorithm will attempt to connect to
an infinite number of guards given enough time. This is a security issue,
since a LAN adversary can firewall a Tor user until Tor eventually
connects to an adversary-controlled entry guard. In prop259, we enforce an
upper bound on the number of guards that Tor will ever attempt to connect
to (a la prop241).
'''ISSUE2''': There are various edge cases and race conditions where Tor
will think that some guards are down, and connect to lower priority
guards, even though the ones on the top are still up (e.g. bug #12450).
While it's very hard to fix all these edge cases, proposal 259 aims to
minimize the time Tor will spend connected to lower priority guards.
'''ISSUE3''': The current Tor guard algorithm is completely unspecified
and undocumented, making it very hard to fix issues and design
improvements. With prop259 we aim to provide a proper algorithm
specification (i.e. a documented state machine) that in the future could
be modded to include various improvements (firewall heuristics, etc.). See
[0] for a brief algorithm description.
The idea was also to produce clean, isolated and tested code that can be
reused and extended with ease in the future (e.g. to do multiple layers of
guards a la prop247). The current entry guard code is spaggheti and
spewed all over the codebase.
----
== State of prop259 ==
You can find the latest version of proposal 259
[https://github.com/twstrike/torspec/blob/review/proposals/259-guard-
selection.txt here], and the thoughtworks crew
[https://github.com/twstrike/tor_for_patching/tree/prop259 has already
implemented a PoC of it in the Tor codebase].
There is also [https://github.com/twstrike/tor_guardsim a Python
simulation of prop259], but I'm not sure what's its current state relative
to the prop259 spec and the little-t-tor implementation. Ola and Reinaldo
showed me some graphs of it in Valencia, that looked about what you would
expect.
That said, IMO, both the design and the implementation need heavy
improvements before we consider them for inclusion in our codebase:
The main problem with the prop259 design right now, is that the algorithm
was not designed to support multiple parallel invocations of it, which is
exactly what Tor does (multiple circuits can pick guards at the same
time). This causes problems like "Prop259 algorithm invocation #2 does not
learn the guard reachability information that invocation #1 knows, and has
to try the same dead guards again".
It is my understanding that when the thoughtworks crew realized the above
issue, they slightly changed the design such that one invocation of the
algorithm can support multiple circuit creations. However, I feel this was
done in a kludgy way on the implementation side. I feel that instead, the
right way forward would be to refactor the state machine to support this
usage model.
IMO, this is the main actual design change that needs to be done to the
algorithm. Also, the spec needs to be cleaned and simplified a bit
(because it got edited multiple times by multiple people and there are
ugly artifacts around), and some constants/states should be renamed to
better names. I feel that fixing these issues properly should take about
2-4 days of thinking time.
I have not looked too deep
[https://github.com/twstrike/tor_for_patching/tree/prop259 at the
implementation], but it seems like it's indeed implementing some version
of proposal 259. The code is workable, but it's also undocumented in
parts, and it also uses some non-standard coding idioms that we would need
to fix. Fortunately, the state machine is well tested, and it seems to
work without crashing on my system. I feel that if we revise the design in
a way we are happy, then we can use the thoughtworks implementation as a
good basis for our branch.
----
== Ways forward ==
The righteous way:
- I feel that the proper way to proceed here (if we could ignore
deliverables, lack of
engineers and funding issues) would be to finish up prop259, and
implement it in Tor. Ideally, for the first months we would be able to run
both algorithms in parallel and get statistics about them, so that we can
actually test it in real life and evaluate its security.
The honest way:
- If we don't have time to do the above, but we are still concerned about
the
security of our current guard selection (i.e. ISSUE1), we could make a
quick hotfix for ISSUE1 in our current codebase. For example, we could
modify `choose_random_entry_impl()` or `pick_entry_guards()` so that
instead of sampling guards from the whole consensus, we use a restricted
set of guards to sample from.
I imagine this is going to be easier to do than ''the righteous way'',
but also it will not help us at all with ISSUE2 and ISSUE3 which are also
quite important (e.g. since we hope to implement prop247 as well).
The cheap way:
- We already have a PoC of the current (incomplete) prop259: We have a
branch
for little-t-tor and a simulation in Python. I think this work could be
enough to satisfy sponsor deliverables if we have no manpower to work on
this. I don't actually want to take this approach, but we are all under
fire from all sides... so dropping things shouldn't be forgotten as an
option.
I'm currently unsure which approach should be taken here. I feel that more
people need to skim over prop259 before we decided that it's a good way
forward, because me and the thoughtworks team are the only people who are
actually familiar with it right now (and I'm already starting to forget
it).
----
== Footnotes ==
[0]:
With prop259, when Tor builds a circuit and needs a guard, Tor calls
START() to initialize the prop259 algo for that particular circuit, and
then it starts calling NEXT() to receive a candidate guard. Tor connects
to the candidate guard, and updates the guard reachability state
accordingly. If the candidate guard did not work, we go back to calling
NEXT(). Otherwise, we call END() and the algorithm finishes successfuly.
Prop259 also specifies the behavior of Tor when it receives a new
consensus.
Prop259 also specifies the already existing "network-up" heuristic of Tor
when it manages to connect to a guard after long times of inactivity (see
prop259 SHOULD_CONTINUE()). This heuristic is already implemented in Tor:
see first_contact at entry_guard_register_connect_status().
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/12595#comment:37>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list