Exit Balancing Patch

Wed Jul 18 08:36:02 UTC 2007

I figure that this probably should be reviewed on or-dev, since it is
a relatively small patch to smartlist_choose_by_bandwidth, but with
potentially large implications for load balancing on Tor (which I
believe is broken). Since balancing is key to usability and thus
anonymity, I believe this patch should go into both 0.1.2.x and 0.2.0.


Presently, Tor attempts to balance exit node usage thusly (from
path-spec.txt):

  For non-exit positions on "fast" circuits, we pick routers as
  above, but we weight the clipped advertised bandwidth of Exit-flagged
  nodes depending on the fraction of bandwidth available from non-Exit
  nodes.  Call the total clipped advertised bandwidth for Exit nodes
  under consideration E, and the total clipped advertised bandwidth for
  non-Exit nodes under consideration N.  If E<N/2, we do not consider
  Exit-flagged nodes.  Otherwise, we weight their bandwidth with the
  factor (E-N/2)/(N+E-N/2) == (2E - N)/(2E + N).  This ensures that
  bandwidth is evenly distributed over nodes in 3-hop paths.


It is possible this formula is vaguely/empirically correct, but it is
so obtuse I think it should be replaced with the more straightforward
calculation with T being the Total network bandwidth (exit+nonexit):

- if E < T/3 we do not consider exit-flagged nodes (and never choose
  them for entry and middle positions)

- else (if E > T/3) we take the leftover E-T/3, divide it by E and get
  exit_weight=(E-T/3)/E. This will be used to weight the bandwidth of
  all exit-flagged nodes.

It is easy to demonstrate that the sum of the weighted total available
exit bandwidth is E-T/3 using this formula, which is the "leftover"
bandwidth after exits are used in one third of all node choices
(assuming the network is otherwise balanced). The original formula
may also provably correct, but I'm not sure how to go about it. It
seems wrong to me though.

This formula is also much more readily adaptable to 2 and 4 hop paths
than the above formula.


Additionally, the patch does the following two things:
1. Expand MAX_BELIEVABLE_BANDWIDTH from 1.5MBytes/sec to 10Mbytes/sec. 
   We already have several Tor nodes with near 5Mbytes/sec. Capping them 
   at 1.5MBytes/sec artifically dumps load on slower nodes. There are
   effective ways to detect/prevent liars. This #define is not one of 
   them. If any nodes exceed the limit, the Tor client will print
   out a notice (mostly just to tell us we need to raise it again).

2. Explicitly ensure we don't return exits if one wasn't requested or 
   the exit_weight=0. It may be a wasted check though, since 
   exit_weight should prevent the tmp variable from exceeding the
   selected bw for exit nodes.

-- 
Mike Perry
Mad Computer Scientist
fscked.org evil labs
-------------- next part --------------
Index: src/or/routerlist.c
===================================================================

--- src/or/routerlist.c	(revision 10850)
+++ src/or/routerlist.c	(working copy)
@@ -1145,7 +1145,7 @@
 
 /** Do not weight any declared bandwidth more than this much when picking
  * routers by bandwidth. */
-#define MAX_BELIEVABLE_BANDWIDTH 1500000 /* 1.5 MB/sec */
+#define MAX_BELIEVABLE_BANDWIDTH 10000000 /* 10 MB/sec */
 
 /** Helper function:
  * choose a random element of smartlist <b>sl</b>, weighted by
@@ -1155,7 +1155,8 @@
  * routerinfo_t's. Otherwise it's a list of routerstatus_t's.
  *
  * If <b>for_exit</b>, we're picking an exit node: consider all nodes'
- * bandwidth equally regardless of their Exit status.  If not <b>for_exit</b>,
+ * bandwidth equally regardless of their Exit status, since there may be
+ * some in the list because they exit to obscure ports. If not <b>for_exit</b>,
  * we're picking a non-exit node: weight exit-node's bandwidth less
  * depending on the smallness of the fraction of Exit-to-total bandwidth.
  */
@@ -1164,7 +1165,7 @@
 {
   int i;
   routerinfo_t *router;
-  routerstatus_t *status;
+  routerstatus_t *status=NULL;
   int32_t *bandwidths;
   int is_exit;
   uint64_t total_nonexit_bw = 0, total_exit_bw = 0, total_bw = 0;
@@ -1202,8 +1203,22 @@
       this_bw = router_get_advertised_bandwidth(router);
     }
     /* if they claim something huge, don't believe it */
-    if (this_bw > MAX_BELIEVABLE_BANDWIDTH)
+    if (this_bw > MAX_BELIEVABLE_BANDWIDTH) {
+      char fp[HEX_DIGEST_LEN+1];
+      if (status) {
+          base16_encode(fp, sizeof(fp),
+                  status->identity_digest, DIGEST_LEN);
+      } else if (router) {
+          base16_encode(fp, sizeof(fp),
+                  router->cache_info.identity_digest, DIGEST_LEN);
+      }
+      log_notice(LD_DIR,
+          "Bandwidth %d for router %s (%s) exceeds allowed max %d, capping",
+          this_bw, router ? router->nickname : "(null)",
+          status || router ? fp : "0",
+          MAX_BELIEVABLE_BANDWIDTH);
       this_bw = MAX_BELIEVABLE_BANDWIDTH;
+    }
     if (is_known) {
       bandwidths[i] = (int32_t) this_bw; // safe since MAX_BELIEVABLE<INT32_MAX
       if (is_exit)
@@ -1250,20 +1265,22 @@
 
   /* Figure out how to weight exits. */
   if (for_exit) {
-    /* If we're choosing an exit node, exit bandwidth counts fully. */
+    /* If we're choosing an exit node, presumably all nodes in the list
+     * can access our port. Count total */
     exit_weight = 1.0;
     total_bw = total_exit_bw + total_nonexit_bw;
-  } else if (total_exit_bw < total_nonexit_bw / 2) {
+  } else if (total_exit_bw < (total_nonexit_bw+total_exit_bw) / 3) {
     /* If we're choosing a relay and exits are greatly outnumbered, ignore
      * them. */
     exit_weight = 0.0;
     total_bw = total_nonexit_bw;
   } else {
-    /* If we're choosing a relay and exits aren't outnumbered use the formula
-     * from path-spec. */
-    uint64_t leftover = (total_exit_bw - total_nonexit_bw / 2);
-    exit_weight = U64_TO_DBL(leftover) /
-      U64_TO_DBL(leftover + total_nonexit_bw);
+    /* If the total exit bandwidth is greater than ONE THIRD of the
+     * TOTAL network bandwidth then the leftover = E-T/3. The weight ratio
+     * is then (E-T/3)/E. */
+    uint64_t leftover = total_exit_bw-(total_exit_bw+total_nonexit_bw)/3;
+    exit_weight = U64_TO_DBL(leftover)/U64_TO_DBL(total_exit_bw);
+
     total_bw =  total_nonexit_bw +
       DBL_TO_U64(exit_weight * U64_TO_DBL(total_exit_bw));
   }
@@ -1292,6 +1309,11 @@
       tmp += ((uint64_t)(bandwidths[i] * exit_weight));
     else
       tmp += bandwidths[i];
+
+    /* Don't return exits if not requested and they are scarce */
+    if (is_exit && !(for_exit || exit_weight > 0.0))
+      continue;
+
     if (tmp >= rand_bw)
       break;
   }
Index: doc/spec/path-spec.txt
===================================================================
--- doc/spec/path-spec.txt	(revision 10850)
+++ doc/spec/path-spec.txt	(working copy)
@@ -182,17 +182,16 @@
    proportional to its advertised bandwidth [the smaller of the 'rate' and
    'observed' arguments to the "bandwidth" element in its descriptor].  If a
    router's advertised bandwidth is greater than MAX_BELIEVABLE_BANDWIDTH
-   (1.5 MB/s), we clip to that value.
+   (10 MB/s), we clip to that value.
 
    For non-exit positions on "fast" circuits, we pick routers as above, but
    we weight the clipped advertised bandwidth of Exit-flagged nodes depending
    on the fraction of bandwidth available from non-Exit nodes.  Call the
    total clipped advertised bandwidth for Exit nodes under consideration E,
-   and the total clipped advertised bandwidth for non-Exit nodes under
-   consideration N.  If E<N/2, we do not consider Exit-flagged nodes.
-   Otherwise, we weight their bandwidth with the factor (E-N/2)/(N+E-N/2) ==
-   (2E - N)/(2E + N).  This ensures that bandwidth is evenly distributed over
-   nodes in 3-hop paths.
+   and the total clipped advertised bandwidth for all nodes under
+   consideration T.  If E<T/3, we do not consider Exit-flagged nodes.
+   Otherwise, we weight their bandwidth with the factor (E-T/3)/E. This 
+   ensures that bandwidth is evenly distributed over nodes in 3-hop paths.
 
    Additionally, we may be building circuits with one or more requests in
    mind.  Each kind of request puts certain constraints on paths:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20070718/f960c457/attachment.pgp>