[or-cvs] [tor/master] Update proposal to bring it more in-line with implementation.
arma at seul.org
arma at seul.org
Thu Sep 17 01:46:23 UTC 2009
Author: Mike Perry <mikeperry-git at fscked.org>
Date: Tue, 1 Sep 2009 20:13:52 -0700
Subject: Update proposal to bring it more in-line with implementation.
Commit: fd412549fdd6631c12d1e6d9092a9506f60d9789
---
.../proposals/151-path-selection-improvements.txt | 80 ++++++++++----------
1 files changed, 40 insertions(+), 40 deletions(-)
diff --git a/doc/spec/proposals/151-path-selection-improvements.txt b/doc/spec/proposals/151-path-selection-improvements.txt
index 3d5f07d..df19e0f 100644
--- a/doc/spec/proposals/151-path-selection-improvements.txt
+++ b/doc/spec/proposals/151-path-selection-improvements.txt
@@ -2,7 +2,7 @@ Filename: 151-path-selection-improvements.txt
Title: Improving Tor Path Selection
Author: Fallon Chen, Mike Perry
Created: 5-Jul-2008
-Status: Draft
+Status: Implemented
Overview
@@ -22,51 +22,37 @@ Implementation
Storing Build Times
- Circuit build times will be stored in the circular array
- 'circuit_build_times' consisting of uint16_t elements as milliseconds.
- The total size of this array will be based on the number of circuits
+ Circuit build times are stored in the circular array
+ 'circuit_build_times' consisting of uint32_t elements as milliseconds.
+ The total size of this array is based on the number of circuits
it takes to converge on a good fit of the long term distribution of
the circuit builds for a fixed link. We do not want this value to be
too large, because it will make it difficult for clients to adapt to
moving between different links.
- From our initial observations, this value appears to be on the order
- of 1000, but will be configurable in a #define NCIRCUITS_TO_OBSERVE.
- The exact value for this #define will be determined by performing
- goodness of fit tests using measurments obtained from the shufflebt.py
- script from TorFlow.
+ From our observations, this value appears to be on the order of 1000,
+ but is configurable in a #define NCIRCUITS_TO_OBSERVE.
Long Term Storage
- The long-term storage representation will be implemented by storing a
+ The long-term storage representation is implemented by storing a
histogram with BUILDTIME_BIN_WIDTH millisecond buckets (default 50) when
- writing out the statistics to disk. The format of this histogram on disk
- is yet to be finalized, but it will likely be of the format
- 'CircuitBuildTime <bin> <count>', with the total specified as
- 'TotalBuildTimes <total>'
+ writing out the statistics to disk. The format this takes in the
+ state file is 'CircuitBuildTime <bin-ms> <count>', with the total
+ specified as 'TotalBuildTimes <total>'
Example:
TotalBuildTimes 100
- CircuitBuildTimeBin 1 50
- CircuitBuildTimeBin 2 25
- CircuitBuildTimeBin 3 13
+ CircuitBuildTimeBin 0 50
+ CircuitBuildTimeBin 50 25
+ CircuitBuildTimeBin 100 13
...
- Reading the histogram in will entail multiplying each bin by the
- BUILDTIME_BIN_WIDTH and then inserting <count> values into the
- circuit_build_times array each with the value of
- <bin>*BUILDTIME_BIN_WIDTH. In order to evenly distribute the
- values in the circular array, a form of index skipping must
- be employed. Values from bin #N with bin count C and total T
- will occupy indexes specified by N+((T/C)*k)-1, where k is the
- set of integers ranging from 0 to C-1.
-
- For example, this would mean that the values from bin 1 would
- occupy indexes 1+(100/50)*k-1, or 0, 2, 4, 6, 8, 10 and so on.
- The values for bin 2 would occupy positions 1, 5, 9, 13. Collisions
- will be inserted at the first empty position in the array greater
- than the selected index (which may requiring looping around the
- array back to index 0).
+ Reading the histogram in will entail inserting <count> values
+ into the circuit_build_times array each with the value of
+ <bin-ms> milliseconds. In order to evenly distribute the values
+ in the circular array, the Fisher-Yates shuffle will be performed
+ after reading values from the bins.
Learning the CircuitBuildTimeout
@@ -77,14 +63,28 @@ Implementation
fitting the data using the estimators at
http://en.wikipedia.org/wiki/Pareto_distribution#Parameter_estimation.
- The timeout itself will be calculated by solving the CDF for the
- a percentile cutoff BUILDTIME_PERCENT_CUTOFF. This value
- represents the percentage of paths the Tor client will accept out of
- the total number of paths. We have not yet determined a good
- cutoff for this mathematically, but 85% seems a good choice for now.
+ The timeout itself is calculated by using the Quartile function (the
+ inverted CDF) to give us the value on the CDF such that
+ BUILDTIME_PERCENT_CUTOFF (80%) of the mass of the distribution is
+ below the timeout value.
- From http://en.wikipedia.org/wiki/Pareto_distribution#Definition,
- the calculation we need is pow(BUILDTIME_PERCENT_CUTOFF/100.0, k)/Xm.
+ Thus, we expect that the Tor client will accept the fastest 80% of
+ the total number of paths on the network.
+
+ Detecting Changing Network Conditions
+
+ We attempt to detect both network connectivty loss and drastic
+ changes in the timeout characteristics. Network connectivity loss
+ is detected by recording a timestamp every time Tor either completes
+ a TLS connection or receives a cell. If this timestamp is more than
+ 90 seconds in the past, circuit timeouts are no longer counted.
+
+ If more than MAX_RECENT_TIMEOUT_RATE (80%) of the past
+ RECENT_CIRCUITS (20) time out, we assume the network connection
+ has changed, and we discard all buildtimes history and compute
+ a new timeout by estimating a new Pareto curve using the
+ position on the Pareto Quartile function for the ratio of
+ timeouts.
Testing
@@ -104,7 +104,7 @@ Implementation
of a new circuit, and the hard cutoff triggers destruction of the
circuit.
- Good values for hard and soft cutoffs seem to be 85% and 65%
+ Good values for hard and soft cutoffs seem to be 80% and 60%
respectively, but we should eventually justify this with observation.
When to Begin Calculation
--
1.5.6.5
More information about the tor-commits
mailing list