[or-cvs] r18877: {projects} flesh out section 5 add a conclusion fold in comments from n (projects/performance)
arma at seul.org
arma at seul.org
Wed Mar 11 10:21:37 UTC 2009
Author: arma
Date: 2009-03-11 06:21:37 -0400 (Wed, 11 Mar 2009)
New Revision: 18877
Modified:
projects/performance/performance.tex
Log:
flesh out section 5
add a conclusion
fold in comments from nick and andrew
start mucking with the lessons-from-economics section
Modified: projects/performance/performance.tex
===================================================================
--- projects/performance/performance.tex 2009-03-11 05:41:12 UTC (rev 18876)
+++ projects/performance/performance.tex 2009-03-11 10:21:37 UTC (rev 18877)
@@ -55,7 +55,7 @@
\maketitle
-As the Tor project has grown, the performance of the Tor network has
+As Tor's user base has grown, the performance of the Tor network has
suffered. This document describes our current understanding of why Tor
is slow, and lays out our options for fixing it.
@@ -89,7 +89,7 @@
thinking hard about how to safely collect metrics about network
performance. But it's becoming increasingly clear that we're not going
to produce the perfect answers just by thinking hard. We need to roll
-out some attempts at solutions, and use the experience to get a better
+out some attempts at solutions, and use the experience to get better
intuition about how to really solve the problems.
%better understanding of anonymity (research)
@@ -97,8 +97,8 @@
We've identified six main reasons why the Tor network is slow.
Problem \#1 is that Tor's congestion control does not work well. We need
-to come up with ways to let the ``quiet'' streams (like web browsing)
-co-exist better with the ``loud'' streams (like bulk transfer).
+to come up with ways to let ``quiet'' streams like web browsing
+co-exist better with ``loud'' streams like bulk transfer.
Problem \#2 is that some Tor users simply put too much traffic onto the
network relative to the amount they contribute, so we need to work on
ways to limit the effects of those users and/or provide priority to the
@@ -108,21 +108,23 @@
develop strategies for increasing the overall community of relays, and
consider introducing incentives to make the network more self-sustaining.
Problem \#4 is that Tor's current path selection algorithms don't actually
-distribute load correctly over the network, introducing hotspots. We need
+distribute load correctly over the network, meaning some relays are
+overloaded and some are underloaded. We need
to develop ways to more accurately estimate the properties of each relay,
and also ways for clients to select paths more fairly.
Problem \#5 is that Tor clients aren't as good as they should be at
-handling high or variable latency and connection failures. We must come
-up with heuristics for clients to automatically shift away from bad
+handling high or variable latency and connection failures. We need better
+heuristics for clients to automatically shift away from bad
circuits, and other tricks for them to dynamically adapt their behavior.
-Problem \#6 is that low-bandwidth users spend too much of their bandwidth
+Problem \#6 is that low-bandwidth users spend too much of their network
overhead downloading directory information. We've made a serious dent
in this problem already, but more work remains here too.
We discuss each reason more in its own section below. For each section,
we explain our current intuition for how to address the problem,
how effective we think each fix would be, how much effort and risk is
-involved, and the recommended next steps.
+involved, and the recommended next steps, all with an eye to what
+can be accomplished in 2009.
While all six categories need to be resolved in order to make the Tor
network fast enough to handle everyone who wants to use it, we've ordered
@@ -242,7 +244,7 @@
to something we can deploy. The next step on our side is to deploy
a separate testing Tor network that uses datagram protocols, based on
patches from Joel and others, and get more intuition from that. We could
-optimistically have this network deployed in late 2009.
+optimistically have this testbed network deployed in late 2009.
\subsection{We chose Tor's congestion control window sizes wrong}
%Changing circuit window size
@@ -253,7 +255,7 @@
congested, and so the originator stops sending.
Kiraly proposed~\cite{circuit-window,tor-l3-approach} that reducing
this window size would substantially decrease latency (although not
-to the same extent as moving to a unreliable link protocol), while not
+to the same extent as moving to an unreliable link protocol), while not
affecting throughput.
Specifically, right now the circuit window size is 512KB and the
@@ -270,10 +272,10 @@
More investigation is needed on precisely what should be the new value
for the circuit window, and whether it should vary.
-Out of 200, 1\,000 (current value in Tor) and 5\,000, the optimum was
-200 for all levels of packet loss. (I'm confused, how do these numbers map to the 512KB and 256KB numbers 2 paragraphs above?)
+Out of 100KB, 512KB (current value in Tor) and 2560KB, they found the
+optimum was 100KB for all levels of packet loss.
However this was only evaluated for a fixed network latency and relay
-bandwidth, and where all users had the same \texttt{CIRCWINDOW} value.
+bandwidth, where all users had the same \texttt{CIRCWINDOW} value.
Therefore, a different optimum may exist for networks with different
characteristics, and during the transition of the network to the new value.
@@ -290,7 +292,7 @@
{\bf Plan}: Once we start on Tor 0.2.2.x (in the next few months), we should
put the patch in and see how it fares. We should go for maximum effect,
-and choose the lowest possible setting of 100 cells (50KB) per chunk.
+and choose the lowest possible window setting of 100 cells (50KB).
%\subsection{Priority for circuit control cells, e.g. circuit creation}
@@ -347,8 +349,12 @@
want to be able to enforce some per-circuit quality-of-service properties.
This meddling is tricky though: we could encounter feedback effects if
-we don't perfectly anticipate the results of our changes.
-(feedback effects such as?)
+we don't perfectly anticipate the results of our changes. For example,
+we might end up squeezing certain classes of circuits too far, causing
+those clients to build too many new circuits in response.
+Or we might simply squeeze all circuits too much, ruining the network
+for everybody.
+% XXX
Also, Bittorrent is designed to resist attacks like this -- it
periodically drops its lowest-performing connection and replaces it with
@@ -389,7 +395,9 @@
we going to get for throttling or blocking certain content? And does the
capability to throttle certain content change the liability situation
for the relay operator?
-(Imagine an AG, ``I demand you block all things immoral, or provide me the API to do it myself; you clearly have the ability. Also anyone who doesn't is going to get served.'')
+%(Imagine an AG, ``I demand you block all things immoral, or provide me
+%the API to do it myself; you clearly have the ability. Also anyone who
+%doesn't is going to get served.'')
{\bf Impact}: Medium-high.
@@ -597,7 +605,9 @@
operators, to give them a better sense of community, to answer questions
and concerns more quickly, etc.
-(What about offering paid support options so relay operators have a place to go for help? Such as corporations or universities running relays get direct phone, email, IM support options?)
+We should also consider offering paid or subsidized support options so
+relay operators have a place to go for help. Corporations and universities
+running relays could get direct phone, email, or IM support options.
\subsubsection{A Facebook app to show off your relay}
@@ -609,11 +619,24 @@
Opportunities for expansion include allowing relay operators to form
``teams'', and for these teams to be ranked on the contribution to the
-network. This competition may give more encouragement for team members to
+network. (Real world examples here include the SETI screensaver and the
+MD5 hash crack challenges.) This competition may give more encouragement
+for team members to
increase their contribution to the network. Also, when one of the team
members has their relay fail, other team members may notice and provide
-assistance on fixing the problem. (Real world examples being SETI screensaver and the MD5 hash crack challenges? Would this also introduce an incentive to cheat to be the top team?)
+assistance on fixing the problem.
+% Would this also introduce an incentive to cheat to be the top team?) -AL
+% Yes. -RD
+\subsubsection{Look for new ways to get people to run relays}
+
+We are not primarily social engineers, and the people that we are good
+at convincing to set up relays are not a very huge group.
+
+We need to keep an eye out for more creative ways to encourage a broader
+class of users to realize that helping out by operating a relay will
+ultimately be something they want to do.
+
\subsection{Funding more relays directly}
Another option is to directly pay hosting fees for fast relays (or
@@ -648,7 +671,7 @@
our time and effort are better spent on design and coding that will
have long-term impact rather than be recurring costs.
-\subsection{Fast Tor relays on Windows}
+\subsection{Handling fast Tor relays on Windows}
\label{sec:overlapped-io}
Advocating that users set up relays is all well and good, but if most
@@ -763,7 +786,12 @@
``what do we mean by sufficiently?'', that we'll just have to guess about.
The third phase is to actually sort out how to construct and distribute
gold-star cryptographic certificates that entry relays can verify.
-(Is this just for public relays? If I offer 10 bridges, do I lose?)
+
+Notice that with the new certificates approach, we can reward users
+who contribute to the network in other ways than running a fast public
+relay -- examples might include top sponsors, users who run stable bridge
+relays, translators, people who fix bugs, etc.
+
{\bf Impact}: Medium-high.
{\bf Effort}: Medium-high.
@@ -782,7 +810,7 @@
Even if we don't add in an incentive scheme, simply making suitable
users into relays by default should do a lot for our capacity problems.
-We've made many steps toward this goal already, with the automated
+We've made many steps toward this goal already, with automated
reachability testing, bandwidth estimation, UPnP support built in to
Vidalia, and so on.
@@ -804,10 +832,12 @@
for what to do if we get more relays than our current directory scheme
can handle is to publish only the best relays, for some metric of best
that considers capacity, expected uptime, etc. That should be a perfectly
-adequate stopgap measure. Besides, if we get into the position of having
-too many relays, we'll want to look at the distribution and properties
-of the relays we have when deciding what algorithms would best make use
-of them.
+adequate stopgap measure. The step after that would be to consider
+splintering the network into two networkstatus documents, and clients
+flip a coin to decide which they use. Ultimately, if we are so lucky as
+to get into the position of having too many relays, we'll want to look
+at the distribution and properties of the relays we have when deciding
+what algorithms would best make use of them.
{\bf Impact}: High.
@@ -855,7 +885,8 @@
In particular, we can estimate the network load because all Tor relays
publish both their capacity and usage in their relay descriptor (but
-see the next section for problems that crop up there). The Tor network
+see \prettyref{sec:better-bandwidth-estimates} for problems that crop
+up there). The Tor network
is currently loaded at around 50\%. This level is much higher than most
reasonable networks, indicating that our plan in \prettyref{sec:capacity}
to get more overall capacity is a good one. But 50\% is quite far from
@@ -977,6 +1008,7 @@
%optimization.
\subsection{The bandwidth estimates we have aren't very accurate}
+\label{sec:better-bandwidth-estimates}
Weighting relay selection by bandwidth only works if we can accurately
estimate the bandwidth for each relay.
@@ -1227,8 +1259,13 @@
\section{Clients need to handle variable latency and failures better}
+The next issue we need to tackle is that Tor clients aren't as good
+as they should be at handling high or variable latency and connection
+failures. First, we need ways to smooth out the latency that clients see.
+Then, for the cases where we can't smooth it out enough, we need better
+heuristics for clients to automatically shift away from bad circuits,
+and other tricks for them to dynamically adapt their behavior.
-
\subsection{Our round-robin and rate limiting is too granular}
Tor's rate limiting uses a token bucket approach to enforce a long-term
@@ -1250,13 +1287,15 @@
\includegraphics[width=\textwidth]{extensiontimes}
\caption{Number of seconds it takes to establish each hop of a 3-hop
circuit. The higher density of samples around 2s, 3s, etc indicate that
-Tor's rate limiting is introducing extra delay into the responses.}
+rate limiting at each relay is introducing extra delay into the
+responses.}
\label{fig:extension-times}
\end{figure}
-Our original theory was that one-second granularity should be sufficient:
+Our original theory when designing Tor's rate limiting was that one-second
+granularity should be sufficient:
cells will go out as quickly as possible while the bucket still has
-tokens for that second, and once it's empty there's nothing we can do
+tokens, and once it's empty there's nothing we can do
but wait until the next second for permission to send more cells.
We should explore refilling the buckets more often than once a second,
@@ -1271,8 +1310,8 @@
so it isn't useful to think in units smaller than that. Also, every
network write operation carries with it overhead from the TLS record,
the TCP header, and the IP packet header. Finally, network transmission
-unit (MTU) sizes vary, but if we could use a larger packet on the wire,
-then we're not being as efficient as we could be.
+unit (MTU) sizes vary, but if we could use a larger packet on the wire
+and we don't, then we're not being as efficient as we could be.
%also, let's say we have 15 connections that want attention
%and we have n tokens for the second to divide up among them
@@ -1299,47 +1338,144 @@
\prettyref{sec:squeeze}, we should keep the option of higher-resolution
rate-limiting in mind.
-\subsection{The switch to Polipo: prefetching, pipelining, etc}
+%\subsection{The switch to Polipo: prefetching, pipelining, etc}
-Polipo makes Tor appear faster, for some web browsing activities. Yay.
-We should continue our plans to migrate to it. (Polipo needs a more active maintainer, and someone with unix to Windows porting experience to make polipo work correctly in Windows. Alternatively, could privoxy include some of the polipo features we like?)
+%Polipo makes Tor appear faster, for some web browsing activities. Yay.
+%We should continue our plans to migrate to it. (Polipo needs a more
+%active maintainer, and someone with unix to Windows porting experience
+%to make polipo work correctly in Windows. Alternatively, could privoxy
+%include some of the polipo features we like?)
\subsection{Better timeouts for giving up on circuits and trying a new one}
-Proposal 151 suggests that clients should estimate their normal circuit extension time, and give up on circuits which are taking substantially longer.
-This should hopefully reduce load on overloaded nodes, and also improve performance for clients.
+Some circuits are established very quickly, and some circuits take many
+seconds to form. The time it takes for the circuit to open can give us
+a hint about how well that circuit will perform for future traffic. We
+should discard extremely slow circuits early, so clients never have to
+even try them.
-\subsection{When a circuit has too many streams on it, move to a new one}
+The question, though, is how to decide the right timeouts? If we set a
+static timeout in the clients, then choosing a number that's too low will
+cause clients to discard too many circuits. Worse, clients on really bad
+connections will never manage to establish a circuit. On the other hand,
+setting a number that's too high won't change the status quo much.
-This would prevent any single circuit from getting too overloaded.
+Fallon Chen worked during her Google-Summer-of-Code-2008 internship with
+us on collecting data about how long it takes for clients to establish
+circuits, and analyzing the data to decide what shape the distribution
+has (it appears to be a Pareto distribution). The goal is for clients
+to track their own circuit build times, and then be able to recognize
+if a circuit has taken longer than it should have compared to the
+previous circuits. That way clients with fast connections can discard
+not-quite-fast-enough circuits, whereas clients with slow connections
+can discard only the really-very-slow circuits. Not only do clients get
+better performance, but we can also dynamically adapt our paths away
+from overloaded relays.
-But actually, this idea would benefit high-volume flows most, so
-it is a bad idea. We should not do it.
+Mike and Fallon wrote a
+proposal\footnote{\url{https://svn.torproject.org/svn/tor/trunk/doc/spec/proposals/151-path-selection-improvements.txt}}
+explaining the details of how to collect the stats, how many data points
+the client needs before it has a good sense of the expected build times,
+and so on.
+Further, there's another case in Tor where adaptive timeouts would be
+smart: how long we wait in between trying to attach a stream to a given
+circuit and deciding that we should try a new circuit. Right now we
+have a crude and static ``try 10 seconds on one, then try 15 seconds
+on another'' algorithm, which is both way too high and way too low,
+depending on the context.
+
+{\bf Impact}: Medium.
+
+{\bf Effort}: Medium, but we're already part-way through it.
+
+{\bf Risk}: Low, unless we've mis-characterized the distribution of
+circuit extend times, in which case clients end up discarding too many
+circuits.
+
+{\bf Plan}: We should deploy the changes in clients in Tor 0.2.2.x to
+collect circuit times, and see how that goes. Then we should gather data
+about stream timeouts to build a plan for how to resolve the second piece.
+
+%\subsection{When a circuit has too many streams on it, move to a new one}
+
+%This would prevent any single circuit from getting too overloaded.
+
+%But actually, this idea would benefit high-volume flows most, so
+%it is a bad idea. We should not do it.
+
\subsection{If extending a circuit fails, try extending a few other
places before abandoning the circuit.}
-This should cut down on the total number of extend attempts in the
-network, which is good since some of our other schemes involve increasing
-that number.
+Right now, when any extend operation fails, we abandon the entire
+circuit. As the reasoning goes, any other approach allows an attacker
+who controls some relays (or part of the network) to dictate our circuits
+(by declining to extend except to relays that he can successfully attack).
+However, this reasoning is probably too paranoid. If we try at most three
+times for each hop, we greatly increase the odds that we can reuse the
+work we've already done, but we don't much increase the odds that an
+attacker will control the entire circuit.
+
+Overall, this modification should cut down on the total number of extend
+attempts in the network. This result is particularly helpful since some
+of our other schemes in this document involve increasing that number.
+
+{\bf Impact}: Low.
+
+{\bf Effort}: Low.
+
+{\bf Risk}: Low-medium. We need to actually do some computations to
+confirm that the risk of whole-path compromise is as low as we think
+it is.
+
+{\bf Plan}: Do the computations, then write a proposal, then do it.
+
\subsection{Bundle the first data cell with the begin cell}
-This would be great for latency and time-to-page-starts-getting-rendered.
-But it's hard because SOCKS wants the handshake to complete before you're
-allowed to send data. we could hack polipo to optimistically send it
-anyway, since we ship with polipo. Seems like a risky move, but quite a
-good payoff.
+In Tor's current design, clients send a ``relay begin'' cell to specify
+the intended destination for our stream, and then wait for a ``relay
+connected'' cell to confirm the connection is established. Only then
+do they complete the SOCKS handshake with the local application, and
+start reading application traffic.
+We could modify our local proxy protocol in the case of Privoxy or Polipo
+so it sends the web request to the SOCKS port during the handshake. Then
+we could optimistically include the first cell worth of application data
+in the original begin cell. This trick would allow us to cut out an entire
+network round-trip every time we establish a new connection through Tor.
+The result would be quicker page loads for users.
+
+Alas, this trick would involve extending the SOCKS protocol, which isn't
+usually a polite strategy when it comes to interoperating with other
+applications. On the other hand, it should be possible to extend it in a
+backwards-compatible way: applications that don't know about the trick
+would still behave the same and still work fine (albeit in a degraded
+mode where they waste a network round-trip).
+
+{\bf Impact}: Medium.
+
+{\bf Effort}: Medium.
+
+{\bf Risk}: Low.
+
+{\bf Plan}: Overall, it seems like a risky move, but with potentially
+quite a good payoff. I'm not convinced either way.
+
\section{Network overhead too high for modem users}
\subsection{We've made progress already at directory overhead}
\label{sec:directory-overhead}
-Tor clients must download information on the network, before they can start building connections.
-The current directory format (version 3) is already gives a substantial reduction in size.
-However, more improvements are possible and Proposal 158, which further reduces the directory overhead, is scheduled to be deployed in the Tor 0.2.2.x series.
-Further background on the directory overhead was given in a blog post~\footnote{\url{https://blog.torproject.org/blog/overhead-directory-info\%3A-past\%2C-present\%2C-future}}
+Tor clients must download information on the network, before they can
+start building connections.
+The current directory format (version 3) already gives a substantial
+reduction in size.
+However, more improvements are possible and Proposal 158, which further
+reduces the directory overhead, is scheduled to be deployed in the Tor
+0.2.2.x series.
+Further background on directory overhead progress is given in our blog
+post\footnote{\url{https://blog.torproject.org/blog/overhead-directory-info\%3A-past\%2C-present\%2C-future}}.
\subsection{TLS overhead also can be improved}
@@ -1405,10 +1541,23 @@
\subsection{Lessons from economics}
\label{sec:economics}
-If, for example, the measures above doubled the effective capacity of the Tor network, the na\"{\i}ve hypothesis is that users would experience twice the throughput.
-Unfortunately this is not true, because it assumes that the number of users does not vary with bandwidth available.
-In fact, as the supply of the Tor network's bandwidth increases, there will be a corresponding increase in the demand for bandwidth from Tor users.
-Simple economics shows that performance of Tor, and other anonymization networks, are controlled by how the number of users scales with available bandwidth, which can be represented by a demand curve.\footnote{This section is based on a blog post published in Light Blue Touchpaper~\cite{economics-tor} and the property discussed was also observed by Andreas Pfitzmann in response to a presentation at the PET Symposium~\cite{wendolsky-pet2007}}.
+Imagine the solutions above double the effective capacity of the Tor
+network. The na\"{\i}ve hypothesis is that users would then experience
+twice the throughput.
+Unfortunately this is not true, because it assumes that the number of
+users does not vary with bandwidth available.
+In fact, as the supply of the Tor network's bandwidth increases, there
+will be a corresponding increase in the demand for bandwidth from
+Tor users.
+Simple economics shows that performance of Tor and other
+anonymization networks is controlled by how the number of users
+scales with available bandwidth; this relationship can be represented
+by a demand
+curve.\footnote{The economics discussion is based on a blog post published
+in Light
+Blue Touchpaper~\cite{economics-tor}. The property discussed was also
+observed by Andreas Pfitzmann in response to a presentation at the PET
+Symposium~\cite{wendolsky-pet2007}.}
\begin{figure}
\includegraphics{equilibrium}
@@ -1419,64 +1568,143 @@
\label{fig:equilibrium}
\end{figure}
-\prettyref{fig:equilibrium} is the typical supply and demand graph from economics textbooks, except with long-term throughput per user substituted for price, and number of users substituted for quantity of goods sold.
-Also, it is inverted, because users prefer higher throughput, whereas consumers prefer lower prices.
-Similarly, as the number of users increases, the bandwidth supplied by the network falls, whereas suppliers will produce more goods if the price is higher.
+\prettyref{fig:equilibrium} is the typical supply and demand graph from
+economics textbooks, except with long-term throughput per user substituted
+for price, and number of users substituted for quantity of goods sold.
+Also, it is inverted, because users prefer higher throughput, whereas
+consumers prefer lower prices.
+Similarly, as the number of users increases, the bandwidth supplied
+by the network falls, whereas suppliers will produce more goods if the
+price is higher.
-In drawing the supply curve, I have assumed the network's bandwidth is constant and shared equally over as many users as needed.
-The shape of the demand curve is much harder to even approximate, but for the sake of discussion, I have drawn three alternatives.
-We will return to these assumptions later.
-The number of Tor users and the throughput they each get is the intersection between the supply and demand curves -- the equilibrium.
-If the number of users is below this point, more users will join and the throughput per user will fall to the lowest tolerable level.
-Similarly, if the number of users is too high, some will be getting lower throughput than their minimum, so will give up, improving the network for the rest of the users.
+In drawing the supply curve, we have assumed the network's bandwidth is
+constant and shared equally over as many users as needed.
+The shape of the demand curve is much harder to even approximate, but
+for the sake of discussion, we have drawn three alternatives.
+The number of Tor users and the throughput they each get is the
+intersection between the supply and demand curves -- the equilibrium.
+If the number of users is below this point, more users will join and
+the throughput per user will fall to the lowest tolerable level.
+Similarly, if the number of users is too high, some will be getting
+lower throughput than their minimum, so will give up, improving the
+network for the rest of the users.
-Now assume Tor's bandwidth grows by 50\% -- the supply curve shifts, as shown in the figure.
-By comparing how the equilibrium moves, we can see how the shape of the demand curve affects the performance improvement that Tor users see.
-If the number of users is independent of performance, shown in curve A, then everyone gets a 50\% improvement, which matches the na\"{\i}ve hypothesis.
-More realistically, the number of users increases, so the performance gain is less and the shallower the curve gets, the smaller the performance increase will be.
-For demand curve B, there is a 18\% increase in the number of Tor users and a 27\% increase in throughput; whereas with curve C there are 33\% more users and so only a 13\% increase in throughput for each user.
+Now assume Tor's bandwidth grows by 50\% -- the supply curve shifts,
+as shown in the figure.
+By comparing how the equilibrium moves, we can see how the shape of the
+demand curve affects the performance improvement that Tor users see.
+If the number of users is independent of performance, shown in curve A,
+then everyone gets a 50\% improvement, which matches the na\"{\i}ve
+hypothesis.
+More realistically, the number of users increases, so the performance
+gain is less and the shallower the curve gets, the smaller the performance
+increase will be.
+For demand curve B, there is a 18\% increase in the number of Tor users
+and a 27\% increase in throughput; whereas with curve C there are 33\%
+more users and so only a 13\% increase in throughput for each user.
-In an extreme case where the demand curve points down (not shown), as the network bandwidth increases, performance for users will fall.
-Products exhibiting this type of demand curve, such as designer clothes, are known as Veblen goods.
-As the price increases, their value as status symbols grows, so more people want to buy them.
-I don't think it is likely to be the case with Tor, but there could be a few users who might think that the slower the network is, the better it is for anonymity.
+In an extreme case where the demand curve points down (not shown),
+as the network bandwidth increases, performance for users will fall.
+Products exhibiting this type of demand curve, such as designer clothes,
+are known as Veblen goods.
+As the price increases, their value as status symbols grows, so more
+people want to buy them.
+I don't think it is likely to be the case with Tor, but there could be
+a few users who might think that the slower the network is, the better
+it is for anonymity.
-To keep the explanation simple, I have made quite a few assumptions, some more reasonable than others.
-For the supply curve, I assume that all Tor's bandwidth goes into servicing user requests, it is shared fairly between users, there is no overhead when the number of Tor clients grows, and the performance bottleneck is the network, not clients.
-I don't think any of these are true, but the difference between the ideal case and reality might not be significant enough to nullify the analysis.
-The demand curves are basically guesswork -- it's unlikely that the true one is as nicely behaved as the ideal ones shown.
-It more likely will be a combination of the different classes, as different user communities come into relevance.
+To keep the explanation simple, I have made quite a few assumptions,
+some more reasonable than others.
+For the supply curve, I assume that all Tor's bandwidth goes into
+servicing user requests, it is shared fairly between users, there is
+no overhead when the number of Tor clients grows, and the performance
+bottleneck is the network, not clients.
+I don't think any of these are true, but the difference between the ideal
+case and reality might not be significant enough to nullify the analysis.
+The demand curves are basically guesswork -- it's unlikely that the true
+one is as nicely behaved as the ideal ones shown.
+It more likely will be a combination of the different classes, as
+different user communities come into relevance.
-I glossed over the aspect of reaching equilibrium -- in fact it could take some time between the network bandwidth changing and the user population reaching stability.
-If this period is sufficiently long and network bandwidth is sufficiently volatile it might never reach equilibrium.
+I glossed over the aspect of reaching equilibrium -- in fact it could
+take some time between the network bandwidth changing and the user
+population reaching stability.
+If this period is sufficiently long and network bandwidth is sufficiently
+volatile it might never reach equilibrium.
I've also ignored effects which shift the demand curve.
-In normal economics, marketing makes people buy a product even though they considered it too expensive.
-Similarly, a Slashdot article or news of a privacy scandal could make Tor users more tolerant of the poor performance.
-Finally, the user perception of performance is an interesting and complex topic, which I've not covered here.
-I've assumed that performance is equivalent to throughput, but actually latency, packet loss, predictability, and their interaction with TCP/IP congestion control are important components too.
+In normal economics, marketing makes people buy a product even though
+they considered it too expensive.
+Similarly, a Slashdot article or news of a privacy scandal could make
+Tor users more tolerant of the poor performance.
+Finally, the user perception of performance is an interesting and complex
+topic, which I've not covered here.
+I've assumed that performance is equivalent to throughput, but actually
+latency, packet loss, predictability, and their interaction with TCP/IP
+congestion control are important components too.
\subsubsection{Differential pricing for Tor users}
-The above discussion has argued that the speed of an anonymity network will converge on the slowest level that the most tolerant users will consider usable.
-This is problematic because there are is significant variation in levels of tolerance between different users and different protocols.
-Most notably, file sharing users are subject to high profile legal threats, and do not require interactive traffic, so will continue to use a network even if the performance is considerably lower than the usable level for web browsing.
+The above discussion has argued that the speed of an anonymity network
+will converge on the slowest level that the most tolerant users will
+consider usable.
+This is problematic because there are is significant variation in levels
+of tolerance between different users and different protocols.
+Most notably, file sharing users are subject to high profile legal
+threats, and do not require interactive traffic, so will continue to use
+a network even if the performance is considerably lower than the usable
+level for web browsing.
-In conventional markets, this type of problem is solved by differential pricing, for example different classes of seat on airline flights.
-In this model, several equilibrium points are allowed to form, and the one chosen will depend on the cost/benefit tradeoffs of the customers.
-A similar strategy could be used for Tor, allowing interactive web browsing users to get higher performance, while forcing bulk data transfer users to have lower performance (but still tolerable for them).
-Alternatively, the network could be configured to share resources in a manner such that the utility to each user is more equal.
-In this case, it will be acceptable to all users that a single equilibrium point is formed, because its level will no longer be in terms of simple bandwidth.
+In conventional markets, this type of problem is solved by differential
+pricing, for example different classes of seat on airline flights.
+In this model, several equilibrium points are allowed to form, and the
+one chosen will depend on the cost/benefit tradeoffs of the customers.
+A similar strategy could be used for Tor, allowing interactive web
+browsing users to get higher performance, while forcing bulk data transfer
+users to have lower performance (but still tolerable for them).
+Alternatively, the network could be configured to share resources in a
+manner such that the utility to each user is more equal.
+In this case, it will be acceptable to all users that a single equilibrium
+point is formed, because its level will no longer be in terms of simple
+bandwidth.
\prettyref{sec:too-much-load} is an example of the former strategy.
-Web browsing users will be offered better performance, so we should attract more of them, but hopefully not so many that the performance returns to current levels.
-In constrast, bulk-traffic users will be given poorer performance, but since they are less sensitive to latency, it could be that they do not mind.
+Web browsing users will be offered better performance, so we should
+attract more of them, but hopefully not so many that the performance
+returns to current levels.
+In constrast, bulk-traffic users will be given poorer performance,
+but since they are less sensitive to latency, it could be that they do
+not mind.
\prettyref{sec:congestion} could be used to implement the latter strategy.
-If web-browsing users are more sensitive to latency than bandwidth, then we could optimize the network for latency rather than throughput.
+If web-browsing users are more sensitive to latency than bandwidth,
+then we could optimize the network for latency rather than throughput.
\subsection{The plan moving forward}
-Need ways to measure improvements
+Our next steps should be to work with funders and developers to turn
+this set of explanations and potential fixes into a roadmap: we need to
+lay out all the solutions, sort out the dependencies, assign developers
+to tasks, and get everything started.
+At the same time, we need to continue to work on ways to measure changes
+in the network: without snapshots for `before' and `after', we'll have
+a much tougher time telling whether a given idea is actually working.
+Many of the plans here have a delay between when we roll out the change
+and when the clients and relays have upgraded enough for the change to
+be noticeable. Since our timeframe requires rolling out several solutions
+at the same time, an increased focus on metrics and measurements will be
+critical to keeping everything straight.
+
+Lastly, we need to be aware that ramping up development on performance
+may need to push out or downgrade other items on our roadmap. So far,
+Tor has been focusing our development energy on the problems that funders
+are experiencing most severely at the time. This approach is good to make
+sure that we're always working on something that's actually important.
+But it also means that next year's critical items don't get as much
+attention as they should, and last year's critical items don't get
+as much maintenance as they should. Ultimately we need to work toward
+having consistent funding for core Tor development and maintenance as
+well as feature-oriented funding.
+
%\subsection*{Acknowledgements}
% Mike Perry provided many of the ideas discussed here
More information about the tor-commits
mailing list