[or-cvs] r9465: Write the entry guards section of path-spec; note a possible (in tor/trunk: . doc doc/spec src/or)

Tue Jan 30 22:19:39 UTC 2007

Author: nickm
Date: 2007-01-30 17:19:31 -0500 (Tue, 30 Jan 2007)
New Revision: 9465

Modified:
   tor/trunk/
   tor/trunk/doc/TODO
   tor/trunk/doc/spec/path-spec.txt
   tor/trunk/src/or/circuitbuild.c
Log:
 r11606 at catbus:  nickm | 2007-01-30 16:52:23 -0500
 Write the entry guards section of path-spec; note a possible bug in cirbuitbuild.c; add a const; defer work on torrc.complete to be part of a bigger config documentation reorg.



Property changes on: tor/trunk
___________________________________________________________________
 svk:merge ticket from /tor/trunk [r11606] on 8246c3cf-6607-4228-993b-4d95d33730f1

Modified: tor/trunk/doc/TODO
===================================================================

--- tor/trunk/doc/TODO	2007-01-30 13:02:36 UTC (rev 9464)
+++ tor/trunk/doc/TODO	2007-01-30 22:19:31 UTC (rev 9465)
@@ -69,7 +69,7 @@
 N - DNS improvements
     . Asynchronous DNS
       - Make evdns use windows strerror equivalents.
-      - Make sure patches get into libevent.
+      . Make sure patches get into libevent.
       - Verify that it works well on windows
     - Debug and re-enable server-side reverse DNS caching
 
@@ -105,7 +105,6 @@
     - More prominently, we should have a recommended apps list.
       - recommend gaim.
       - unrecommend IE because of ftp:// bug.
-N   - torrc.complete.in needs attention?
 N   - we should add a preamble to tor-design saying it's out of date.
 N   - Document transport and natdport
 
@@ -409,6 +408,7 @@
   - Look into generating torrc.{complete|sample}.in, tor.1.in,
     the HTML manual, and the online config documentation from a single
     source.
+    - torrc.complete.in needs attention?
 
 Future version:
   - Configuration format really wants sections.

Modified: tor/trunk/doc/spec/path-spec.txt
===================================================================
--- tor/trunk/doc/spec/path-spec.txt	2007-01-30 13:02:36 UTC (rev 9464)
+++ tor/trunk/doc/spec/path-spec.txt	2007-01-30 22:19:31 UTC (rev 9465)
@@ -135,10 +135,11 @@
    If we fail to build a circuit N times in a X second period (see Section
    2.3 for how this works), we stop building circuits until the X seconds
    have elapsed.
-   XXX
+   XXXX
 
 2.1.6. When to tear down circuits
 
+   XXXX
 
 2.2. Path selection and constraints
 
@@ -267,37 +268,61 @@
 
 5. Guard nodes
 
-  XXX writeme
+  We use Guard nodes (also called "helper nodes" in the literature) to
+  prevent certain profiling attacks.  Here's the risk: if we choose entry and
+  exit nodes at random, and an attacker controls C out of N servers, then the
+  attacker will control the entry and exit node of any given circuit with
+  probability (C/N)^2.  But as we make many different circuits over time,
+  then the probability that the attacker will see a sample of about (C/N)^2
+  of our traffic goes to 1.  Since statistical sampling works, the attacker
+  can be sure of learning a profile of our behavior.
 
+  If, on the other hand, we picked an entry node and held it fixed, we would
+  have probability C/N of choosing a bad entry and being profiled, and
+  probability (N-C)/N of choosing a good entry and not being profiled.
+
+  When guard nodes are enabled, Tor maintains an ordered list of entry nodes
+  as our chosen guards, and store this list persistently to disk.  If a Guard
+  node becomes unusable, rather than replacing it, Tor adds new guards to the
+  end of the list.  When it comes time to choose an entry, Tor chooses at
+  random from among the first NumEntryGuards (default 3) usable guards on the
+  list.  If there are not at least 2 usable guards on the list, Tor adds
+  routers until there are, or until there are no more usable routers to add.
+
+  A guard is unusable if any of the following hold:
+    - it is not marked as a Guard by the networkstatuses,
+    - it is not marked Valid (and the user hasn't set AllowInvalid entry)
+    - it is not marked Running
+    - Tor couldn't reach it the last time it tried to connect
+
+  A guard is unusable for a particular circuit if any of the rules for path
+  selection in 2.2 are not met.  In particular, if the circuit is "fast"
+  and the guard is not Fast, or if the circuit is "stable" and the guard is
+  not Stable, Tor can't use the guard for that circuit.
+
+  If the guard is excluded because of its status in the networkstatuses for
+  over 30 days, Tor removes it from the list entirely, preserving order.
+
+  If Tor fails to connect to an otherwise usable guard, it retries
+  periodically: every hour for six hours, every for hours for 3 days, every
+  18 hours for a week, and every 36 hours thereafter.  Additionally, Tor
+  retries unreachable guards the first time it adds a new guard to the list,
+  since it is possible that the old guards were only marked as unreachable
+  because the network was unreachable or down.
+
+  Tor does not add a guard persistently to the list until the first time we
+  have connected to it successfully.
+
 6. Testing circuits
 
+  XXXX
 
 
 
-(From some emails by arma)
 
-Right now the code exists to pick helper nodes, store our choices to
-disk, and use them for our entry nodes. But there are three topics
-to tackle before I'm comfortable turning them on by default. First,
-how to handle churn: since Tor nodes are not always up, and sometimes
-disappear forever, we need a plan for replacing missing helpers in a
-safe way. Second, we need a way to distinguish "the network is down"
-from "all my helpers are down", also in a safe way. Lastly, we need to
-examine the situation where a client picks three crummy helper nodes
-and is forever doomed to a lousy Tor experience. Here's my plan:
+X. Old notes
 
-How to handle churn.
-  - Keep track of whether you have ever actually established a
-    connection to each helper. Any helper node in your list that you've
-    never used is ok to drop immediately. Also, we don't save that
-    one to disk.
-  - If all our helpers are down, we need more helper nodes: add a new
-    one to the *end*of our list. Only remove dead ones when they have
-    been gone for a very long time (months).
-  - Pick from the first n (by default 3) helper nodes in your list
-    that are up (according to the network-statuses) and reachable
-    (according to your local firewall config).
-    - This means that order matters when writing/reading them to disk.
+X.1. Do we actually do this?
 
 How to deal with network down.
   - While all helpers are down/unreachable and there are no established
@@ -317,110 +342,11 @@
     testing circuit, can we get away with converting it to a normal
     circuit and beginning to use it immediately?)
 
-How to pick non-sucky helpers.
-  - When we're picking a new helper nodes, don't use ones which aren't
-    reachable according to our local ReachableAddresses configuration.
-  (There's an attack here: if I pick my helper nodes in a very
-   restrictive environment, say "ReachableAddresses 18.0.0.0/255.0.0.0:*",
-   then somebody watching me use the network from another location will
-   guess where I first joined the network. But let's ignore it for now.)
-  - Right now we choose new helpers just like we'd choose any entry
-    node: they must be "stable" (claim >1day uptime) and "fast" (advertise
-    >10kB capacity). In 0.1.1.11-alpha, clients let dirservers define
-    "stable" and "fast" however they like, and they just believe them.
-    So the next step is to make them a function of the current network:
-    e.g. line up all the 'up' nodes in order and declare the top
-    three-quarter to be stable, fast, etc, as long as they meet some
-    minimum too.
-  - If that's not sufficient (it won't be), dirservers should introduce
-    a new status flag: in additional to "stable" and "fast", we should
-    also describe certain nodes as "entry", meaning they are suitable
-    to be chosen as a helper. The first difference would be that we'd
-    demand the top half rather than the top three-quarters. Another
-    requirement would be to look at "mean time between returning" to
-    ensure that these nodes spend most of their time available. (Up for
-    two days straight, once a month, is not good enough.)
-  - Lastly, we need a function, given our current set of helpers and a
-    directory of the rest of the network, that decides when our helper
-    set has become "too crummy" and we need to add more. For example,
-    this could be based on currently advertised capacity of each of
-    our helpers, and it would also be based on the user's preferences
-    of speed vs. security.
+  [Do we actually do any of the above?  If so, let's spec it.  If not, let's
+  remove it. -NM]
 
-***
+X.2. A thing we could do to deal with reachability.
 
-Lasse wrote:
-> I am a bit concerned with performance if we are to have e.g. two out of
-> three helper nodes down or unreachable. How often should Tor check if
-> they are back up and running?
-
-Right now Tor believes a threshold of directory servers when deciding
-whether each server is up. When Tor observes a server to be down
-(connection failed or building the first hop of the circuit failed),
-it marks it as down and doesn't try it again, until it gets a new
-network-status from somebody, at which point it takes a new concensus
-and marks the appropriate servers as up.
-
-According to sec 5.1 of dir-spec.txt, the client will try to fetch a new
-network-status at least every 30 minutes, and more often in certain cases.
-
-With the proposed scheme, we'll also mark all our helpers as up shortly
-after the last one is marked down.
-
-> When should there be
-> added an extra node to the helper node list? This is kind of an
-> important threshold?
-
-I agree, this is an important question. I don't have a good answer yet. Is
-it terrible, anonymity-wise, to add a new helper every time only one of
-your helpers is up? Notice that I say add rather than replace -- so you'd
-only use this fourth helper when one of your main three helpers is down,
-and if three of your four are down, you'd add a fifth, but only use it
-when two of the first four are down, etc.
-
-In fact, this may be smarter than just picking a random node for your
-testing circuit, because if your network goes up and down a lot, then
-eventually you have a chance of using any entry node in the network for
-your testing circuit.
-
-We have a design choice here. Do we only try to use helpers for the
-connections that will have streams on them (revealing our communication
-partners), or do we also want to restrict the overall set of nodes that
-we'll connect to, to discourage people from enumerating all Tor clients?
-
-I'm increasingly of the belief that we want to hide our presence too,
-based on the fact that Steven and George and others keep coming up with
-attacks that start with "Assuming we know the set of users".
-
-If so, then here's a revised "How to deal with network down" section:
-
-  1) When a helper is marked down or the helper list shrinks, and as
-     a result the total number of helpers that are either (up and
-     reachable) or (reachable but never connected to) is <= 1, then pick
-     a new helper and add it to the end of the list.
-     [We count nodes that have never been connected to, since otherwise
-      we might keep on adding new nodes before trying any of them. By
-      "reachable" I mean "is allowed by ReachableAddresses".]
-  2) When you fail to connect to a helper that has never been connected
-     to, you remove him from the list right then (and the above rule
-     might kick in).
-  3) When you succeed at connecting to a helper that you've never
-     connected to before, mark all reachable helpers earlier in the list
-     as up, and close that circuit.
-     [We close the circuit, since if the other helpers are now up, we
-      prefer to use them for circuits that will reveal communication
-      partners.]
-
-This certainly seems simpler. Are there holes that I'm missing?
-
-> If running from a laptop you will meet different firewall settings, so
-> how should Helper Nodes settings keep up with moving from an open
-> ReachableAddresses to a FascistFirewall setting after the helper nodes
-> have been selected?
-
-I added the word "reachable" to three places in the above list, and I
-believe that totally solves this question.
-
 And as a bonus, it leads to an answer to Nick's attack ("If I pick
 my helper nodes all on 18.0.0.0:*, then I move, you'll know where I
 bootstrapped") -- the answer is to pick your original three helper nodes
@@ -429,123 +355,18 @@
 likely (though not certain) that some of the originals will become useful.
 Is that smart or just complex?
 
-> What happens if(when?) performance of the third node is bad?
+X.3. Some stuff that worries me about entry guards. 2006 Jun, Nickm.
 
-My above solution solves this a little bit, in that we always try to
-have two nodes available. But what if they are both up but bad? I'm not
-sure. As my previous mail said, we need some function, given our list
-of helpers and the network directory, that will tell us when we're in a
-bad situation. I can imagine some simple versions of this function --
-for example, when both our working helpers are in the bottom half of
-the nodes, ranked by capacity.
+  It is unlikely for two users to have the same set of entry guards.
+  Observing a user is sufficient to learn its entry guards.  So, as we move
+  around, entry guards make us linkable.  If we want to change guards when
+  our location (IP? subnet?) changes, we have two bad options.  We could
+    - Drop the old guards.  But if we go back to our old location,
+      we'll not use our old guards.  For a laptop that sometimes gets used
+      from work and sometimes from home, this is pretty fatal.
+    - Remember the old guards as associated with the old location, and use
+      them again if we ever go back to the old location.  This would be
+      nasty, since it would force us to record where we've been.
 
-But the hard part: what's the remedy when we decide there's something
-to fix? Do we add a third, and now we have two crummy ones and a new
-one? Or do we drop one or both of the bad ones?
-
-Perhaps we believe the latest claim from the network-status concensus,
-and we count a helper the dirservers believe is crummy as "not worth
-trying" (equivalent to "not reachable under our current ReachableAddresses
-config") -- and then the above algorithm would end up adding good ones,
-but we'd go back to the originals if they resume being acceptable? That's
-an appealing design. I wonder if it will cause the typical Tor user to
-have a helper node list that comprises most of the network, though. I'm
-ok with this.
-
-> Another point you might want to keep in mind, is the possibility to
-> reuse the code in order to add a second layer helper node (meaning node
-> number two) to "protect" the first layer (node number one) helper nodes.
-> These nodes should be tied to each of the first layer nodes. E.g. there
-> is one helper node list, as described in your mail, for each of the
-> first layer nodes, following their create/destroy.
-
-True. Does that require us to add a fourth hop to our path length,
-since the first hop is from a limited set, the second hop is from a
-limited set, and the third hop might also be constrained because, say,
-we're asking for an unusual exit port?
-
-> Another of the things might worth adding to the to do list is
-> localization of server (helper) nodes. Making it possible to pick
-> countries/regions where you do (not) want your helper nodes located. (As
-> in "HelperNodesLocated us,!eu" etc.) I know this requires the use of
-> external data and may not be worth it, but it _could_ be integrated at
-> the directory servers only -- adding a list of node IP's and e.g. a
-> country/region code to the directory and thus reduce the overhead. (?)
-> Maybe extending the Family-term?
-
-I think we are heading towards doing path selection based on geography,
-but I don't have a good sense yet of how that will actually turn out --
-that is, with what mechanism Tor clients will learn the information they
-need. But this seems to be something that is orthogonal to the rest of
-this discussion, so I look forward to having somebody else solve it for
-us, and fitting it in when it's ready. :)
-
-> And I would like to keep an option to pick the first X helper nodes
-> myself and then let Tor extend this list if these nodes are down (like
-> EntryNodes in current code). Even if this opens up for some new types of
-> "relationship" attacks.
-
-Good idea. Here's how I'd like to name these:
-
-The "EntryNodes" config option is a list of seed helper nodes. When we
-read EntryNodes, any node listed in entrynodes but not in the current
-helper node list gets *pre*pended to the helper node list.
-
-The "NumEntryNodes" config option (currently called NumHelperNodes)
-specifies the number of up, reachable, good-enough helper nodes that
-will make up the pool of possible choices for first hop, counted from
-the front of the helper node list until we have enough.
-
-The "UseEntryNodes" config option (currently called UseHelperNodes)
-tells us to turn on all this helper node behavior. If you set EntryNodes,
-then this option is implied.
-
-The "StrictEntryNodes" config option, provided for backward compatibility
-and for debugging, means a) we replace the helper node list with the
-current EntryNodes list, and b) whenever we would do an operation that
-alters the helper node list, we don't. (Yes, this means that if all the
-helper nodes are down, we lose until we mark them up again. But this is
-how it behaves now.)
-
-> I am sure my next point has been asked before, but what about testing
-> the current speed of the connections when looking for new helper nodes,
-> not only testing the connectivity? I know this might contribute to a lot
-> of overhead in the network, but if this only occur e.g. when using
-> helper nodes as a Hidden Service it might not have that large an impact,
-> but could help availability for the services?
-
-If we're just going to be testing them when we're first picking them,
-then it seems we can do the same thing by letting the directory servers
-test them. This has the added benefit that all the (behaving) clients
-use the same data, so they don't end up partitioned by a node that
-(for example) performs selectively for his victims.
-
-Another idea would be to periodically keep track of what speeds you get
-through your helpers, and make decisions from this. The reason we haven't
-done this yet is because there are a lot of variables -- perhaps the
-web site is slow, perhaps some other node in the path is slow, perhaps
-your local network is slow briefly, perhaps you got unlucky, etc.  I
-believe that over time (assuming the user has roughly the same browsing
-habits) all of these would average out and you'd get a usable answer,
-but I don't have a good sense of how long it would take to converge,
-so I don't know whether this would be worthwhile.
-
-> BTW. I feel confortable with all the terms helper/entry/contact nodes,
-> but I think you (the developers) should just pick one and stay with it
-> to avoid confusion.
-
-I think I'm going to try to co-opt the term 'Entry' node for this
-purpose. We're going to have to keep referring to helper nodes for the
-research community for a while though, so they realize that Tor does
-more than just let users ask for certain entry nodes.
-
-
-
-============================================================
-Some stuff that worries me about entry guards. 2006 Jun, Nickm.
-
-1. It is unlikely for two users to have the same set of entry guards.
-
-2. Observing a user is sufficient to learn its entry guards.
-
-3. So, as we move around, we leak our 
+  [Do we do any of this now? If not, this should move into 099-misc or
+  098-todo. -NM]

Modified: tor/trunk/src/or/circuitbuild.c
===================================================================
--- tor/trunk/src/or/circuitbuild.c	2007-01-30 13:02:36 UTC (rev 9464)
+++ tor/trunk/src/or/circuitbuild.c	2007-01-30 22:19:31 UTC (rev 9465)
@@ -1918,7 +1918,7 @@
 /** Return 1 if <b>digest</b> matches the identity of any node
  * in the entry_guards list. Else return 0. */
 static INLINE int
-is_an_entry_guard(char *digest)
+is_an_entry_guard(const char *digest)
 {
   SMARTLIST_FOREACH(entry_guards, entry_guard_t *, entry,
                     if (!memcmp(digest, entry->identity, DIGEST_LEN))
@@ -2219,6 +2219,11 @@
           r = entry_is_live(e, 0, 1, 1);
           if (r && !r->is_running) {
             refuse_conn = 1;
+            /* XXXX012 I think this might be broken; when picking entry nodes,
+             * we only look at unreachable_since and is_time_to_retry, and we
+             * pay no attention to is_running. If this is indeed the case, we
+             * can fix the bug by adding a retry_as_entry flag to
+             * routerinfo_t. -NM */
             r->is_running = 1;
           }
         }