[tor-commits] [tor-browser-spec/master] Comit befour dooing speel chek.
mikeperry at torproject.org
mikeperry at torproject.org
Mon May 4 18:32:55 UTC 2015
commit 409b8c1a2f541be93b2b1126ba892a12411d3db9
Author: Mike Perry <mikeperry-git at torproject.org>
Date: Thu Apr 30 21:07:16 2015 -0700
Comit befour dooing speel chek.
YOLLO!
---
position-papers/HTTP3/HTTP3.tex | 367 +++++++++++++++++++++++++++++----------
1 file changed, 280 insertions(+), 87 deletions(-)
diff --git a/position-papers/HTTP3/HTTP3.tex b/position-papers/HTTP3/HTTP3.tex
index d9f2ab0..720bc25 100644
--- a/position-papers/HTTP3/HTTP3.tex
+++ b/position-papers/HTTP3/HTTP3.tex
@@ -29,98 +29,291 @@
\begin{abstract}
+The Tor Project has a keen interest in the development of future standards
+involving the HTTP application layer and associated transport layers. At
+minimum, we seek to ensure that all future HTTP standards remain compatible
+with the Tor Network, avoid introducing new third party tracking and
+linkability vectors, and minimize client fingerprintability. We also have a
+strong interest in the development of enhancements and/or extensions that
+protect the confidentiality and integrity of HTTP traffic, as well as provide
+resistance to traffic fingerprinting and general traffic analysis. We are
+presently actively researching these areas.
\end{abstract}
-
\section{Introduction}
-% XXX: Describe our organization? The Tor Project, Inc is a non-profit...
-
-% XXX: In this position paper, we describe the current and potential issues with
-
-Dangers and opportunities with resepect to browsing the Internet anonymously are
-often tied to the browser itself and not its underlying transport protocols:
-canvas fingerprinting, plugin enumeration and linking users via the DOM storage
-are just a few of the means the browser offers for trying to single users out.
-And even things like cookies and referrers, although belonging strictly speaking
-to HTTP, the transport protocol powering the web, are usually seen in the
-context of the browser itself as its additional policies shape the particular
-tracking potential of these and other transport related features.
-This is not much different for features in HTTP/2 although, compared to
-HTTP/1.1, it has a growing list of tracking risks that should be addressed in
-the specification itself. We discuss some of them below proposing ways to take
-these and other risks into account in future HTTP specifications.
-
-Apart from the dangers we just hinted at, beginning with HTTP/2 opportunities
-emerge as well: Using HTTP/2's flow control could make it easier to defend
-against adversaries sniffing a user's encrypted traffic and trying to extract
-information out of it by means of website traffic fingerprinting. We discuss
-current limitations and potential improvements for HTTP/3.
-
-\section{A Short Tracking Guide(1)}
-
-
-If we are talking about tracking on the Internet then we mainly have third-party
-tracking in mind. In this scenario the attacker has basically two mechanisms
-available: identifier based tracking (e.g. using cookies or cache cookies) and
-fingerprinting a user's device or environment.
-
-Additionally, we may encounter powerful parties that see a lot of a user's
-traffic due to being in a privileged position (e.g. search engines). They don't
-necessarily need to bother with third party tracking and would still be able to
-learn a lot of a user's details by correlating traffic which is endangering her
-anonymity.
-
-The defenses we develop in Tor Browser are:
-
-1) Binding identifiers to the URL bar domain. This is retaining functionality
-while preventing cross-origin identifier linkability: saving third party
-identifiers (e.g. via DOM storage) in a URL bar domain context does not make
-them available in a different URL bar domain context.
-2) Making users as uniform as possible while not breaking functionality.
-3) Providing a "New Identity" button that is clearing all browser state and
-giving the user a clean, new session.
-
-
-\subsection{User Tracking with HTTP}
-
-\subsection{Re-Using Connections}
-Coalescing connections might allow tracking users across origins just by means
-of HTTP. Together with a long keep-alive this might make it easy to correlate a
-lot of cross-domain traffic of a privacy conscious user even if she has
-JavaScript and third-party cookies disabled. Granted, having this feature in
-HTTP/2 is a big deal especially with respect to CDNs. But still we think
-allowing implementers to provide means to mitigate the issue directly in the
-specification seems worthwhile to do. This does not imply avoiding coalescing
-connections in the first place. Not at all. One could think about proposing a
-middle ground safe-guarding privacy while still providing advantages speed- and
-resource-wise: connections should not be reused across different URL bar
-domains.
-
-\subsection{Timing Side-Channels}
-
-PING and SETTINGS frames are acknowledged immediately by the client which might
-give servers the option to collect information about a client via timing
-side-channels. It is true, there are other means an attacker could use for the
-same purpose but they are either visible in the browser UI or users can disable
-them. As a countermeasure the specification could at least allow jitter in some
-cases (PING frames come to mind). If that is not an option one could specify
-that a client may close the connection to prevent timing side-channel attacks.
-
-
-\section{A Short Website Traffic Fingerprinting Guide}
-
-
-\subsection{Defending against Website Traffic Fingerprinting with HTTP}
-
-(1) For a detailed explanation including a theroretical background see:
-https://www.torproject.org/projects/torbrowser/design/.
-
-\section{Conclusions}
-
-
-\bibliographystyle{plain} \bibliography{W3C-DNT}
+The Tor Project, Inc is a US non-profit dedicated to providing technology and
+education to support online privacy, anonymity, and censorship circumvention.
+Our primary products are the Tor network software, and the Tor Browser, which
+is based on Firefox.
+
+In this position paper, we describe the concerns of the Tor Project with
+respect to future HTTP standardization. These concerns are broken down into
+five areas: identifier linkability, connection usage linkability,
+fingerprinting linkability, traffic confidentiality and integrity, traffic
+fingerprinting and traffic analysis, and Tor network compatibility.
+
+Each of these areas of concern is communicated in a separate section of this
+position paper. We have also performed a preliminary review of HTTP/2 with
+respect to these areas, and have noted our findings inline. We will be
+performing a more in-depth review of HTTP/2 for client fingerprinting and
+other tracking issues in the coming months.
+
+\section{Identifier Linkability}
+
+Identifier linkability is the ability to use any form of browser state, cache,
+data storage, or identifier to track (link) a user between two otherwise
+independent actions. For the purposes of this position paper, we are concerned
+with any browser state that persists beyond the duration of a single
+connection.
+
+The Tor Project has designed Tor Browser with two main properties for limiting
+identifier-based tracking: First Party Isolation, and Long Term Unlinkability.
+
+First Party Isolation is the property that a user's actions at one
+top-level URL bar domain cannot be correlated or linked to their actions on a
+different top-level URL bar domain. We maintain this property through a number
+of patches and modifications to various aspects of browser functionality and
+state keeping.
+% FIXME: Cite
+
+Long Term Unlinkability is the property that the user may completely clear all
+website-visible data and other identifiers associated, such that their future
+activity cannot be linked or correlated to any activity prior to this action.
+Tor Browser provides Long Term Unlinkability by allowing the user to clear all
+browser tracking data in a single click (called "New Identity"). Our long-term
+goal is to allow users to define their relationship with individual first
+parties, and alter that relationship by resetting or blocking the associated
+tracking data on a per-site basis.
+
+\subsection{Identifier Linkability in HTTP/2}
+
+The Tor Project is still in the process of evaluating the stateful nature of
+HTTP/2 connections. It is likely that we will be able to isolate the usage of
+HTTP/2 connection state in a similar way to how we currently isolate HTTP
+connection state, as well as close these connections and clear that state when
+the user chooses to use a New Identity. However, it is not clear yet at this
+point how complicated this isolation will be.
+
+\subsection{Avoiding Future Identifier Linkability}
+
+We feel that it is very important that mechanisms for identifier usage,
+storage, and connection-related state keeping be cleanly abstracted and
+narrowly scoped within the HTTP protocol. However, we also recognize that to a
+large degree identifier usage and the resulting linkability is primarly an
+implementation detail, and not specific to the protocol itself.
+
+Identifier linkability will become a problem if instances arise where the
+server is allowed to specify a setting or configuration property for a client
+that must presist beyond the duration of the session. In these cases, care
+must be taken to ensure that this data is cleared or isolated upon entry to
+private browsing mode, or when the user attempts to clear their private data.
+
+
+\section{Connection Usage Linkability}
+
+Connection usage linkability arises from the use of the same underlying
+transport stream for requests that would otherwise be independent due to the
+first party isolation of their associated identifiers and browser state.
+
+Tor Browser currently enforces connection usage unlinkability at the HTTP
+layer, by creating independent HTTP connections for third party hosts that
+are sourced from different first party domains.
+
+\subsection{Connection Usage Linkability with HTTP/2}
+
+The heavy use of connection multiplexing in HTTP/2 may present additional
+complexities for ensuring that requests are isolated. Unfortunately, unlike
+identifier usage, connection usage linkability is encouraged by the
+HTTP/2 specification in Section 9.1 (in the form of specifying that clients
+SHOULD NOT open more than one connection to a given host and port).
+
+\subsection{Avoiding Future Connection Usage Linkability}
+
+In the future, connection usage linkability may become a problem if the notion
+of a connection becomes further abstracted from the transport, and instead is
+enforced through a collection of identifiers or stateful behavior in the
+browser. This may tend to further encourage implementations that make it
+difficult to decouple the notion of a connection from the notion of a
+destination address.
+
+Even this is technically an implementation issue, but consideration should be
+taken to ensure that the specification does not encourage implementations to
+bake in deep assumptions about providing only a single connection instance per
+site, as was done for HTTP/2.
+
+\section{Fingerprinting Linkability}
+
+User agent fingerprinting arises from four sources: end-user configuration
+details, device and hardware characteristics, operating system vendor and
+version differences, and browser vendor and version differences.
+
+The Tor Project is primarily concerned with minimzing the ability of websites
+to obtain or infer end user configuration details and device characteristics.
+We concern ourselves with operating system fingerprinting only to the point of
+removing ways of detecting a specific operating system version. We make no
+attempt to address fingerprinting due to browser vendor and version
+differences. % FIXME: cite fingerprinting doc
+
+Under this model, it is unlikely that very many fingerprinting vectors that
+concern us will arise in the HTTP layer. However, the possibility for end user
+configuration details to leak into behaviors of the HTTP layer is still a
+possibility.
+
+\subsection{Fingerprinting Linkability in HTTP/2}
+
+The Tor Project is still in the process of evaluating client
+fingerprintability in HTTP/2. The largest potential source of fingerprinting
+appears to be in the SETTINGS frame. If these values vary depending on end-user
+configuration, hardware capabilities, or operating system version, we may
+alter our implementation's behavior accordingly.
+
+\subsection{Avoiding Future Fingerprinting Linkability}
+
+It is concievable that more fingerprinting vectors could arise in future,
+especially if flow control and stream multiplexing decisions are delegated to
+the client, and depend on things like available system memory, available CPU
+cores, or other details. Care should be taken to avoid these situations,
+though we also expect them to be unlikely.
+
+\section{Traffic Confidentiality and Integrity}
+
+The Tor Project is very interested in any efforts to improve the
+confidentiality and integrity of the session layer of HTTP/3.
+
+In particular, we are strong advocates for mandatory authenticated encryption
+of HTTP/3 connections. The availability of entry-level authentication through
+the Let's Encrypt Project should eliminate the remaining barriers to requiring
+authenticated encryption, as opposed to deploying opportunistic mechanisms.
+
+We are also interested in efforts to encrypt the ClientHello and ServerHello
+messages in an initial forward-secure handshake, as described in the Encrytped
+TLS Handshake proposal. If SNI, ALPN, and the ServerHello can be encrypted
+using an ephemeral exchange that is authenticated later in the handshake,
+the adversary loses a great deal of information about the user's intended
+destination site. When large scale CDNs and multi-site load balancers are
+involved, the ulimate destination would be impossible to determine with this
+type of handshake in place. This will aid in defenses against traffic
+fingerprinting and traffic analysis, which we describe detail in the next
+section.
+
+% FIXME: Cite https://tools.ietf.org/html/draft-ray-tls-encrypted-handshake-00
+
+\section{Traffic Fingerprinting and Traffic Analysis}
+
+Website Traffic Fingerprinting is the process of using machine learning to
+classify web page visits based on their encrypted traffic patterns. It is most
+effective when exact request and response lengths are visible, and when the
+classification domain is limited by knowledge of the specific site being
+visited.
+
+Tor's fixed 512 byte packet size and large classification domain go a long way
+to imede this attack for minimal overhead. The 512 byte packet size helps to
+obscure some amount of length information, and Tor's link encryption conceals
+the destination website reduces classifier accuracy and capabilities, due
+largely to the Base Rate Fallacy. There was some initial controversy in the
+literature as to the exact degree to which this was the case, but after
+publicly requesting that these effects be studied in closer detail, recent
+results have confirmed and quantized the benefits conferred by Tor's unique
+link encryption.
+
+For this reason, we have been encouraging continued study of low-overhead
+defenses against traffic fingerprinting. We are optimistic that clever use of
+request bundling and response chunking can be combined with minimal amounts of
+padding to significantly reduce the accuracy of this attack, even when the
+attack is combined with prior information that reduces the size of the
+classification domain.
+
+With the aid of an encrypted TLS handshake, we are also hopeful that these
+defenses will also be applicable to non-Tor TLS sessions as well. This will
+also serve to increase the difficulty of end-to-end correlation and general
+traffic analysis of Tor Exit node traffic.
+
+% FIXME: Cite Mjarez's paper and wfpadtools
+
+\subsection{Traffic Analysis Issues with HTTP/2}
+
+In our preliminary investigation of HTTP2/, however, we discovered that
+certain aspects of the protocol may aid certain types of traffic analysis
+attacks.
+
+In particular, the PING and SETTINGS frames are acknowledged immediately by
+the client which might give servers the option to collect information about a
+client via timing side-channels. They also allow the server to introduce an
+active traffic pattern that can be used for end-to-end correlation or
+confirmation, independent of client behavior.
+
+It is true that there are other means an attacker could use for the same
+purpose (such as redirects or Javascript), but these mechanisms can either be
+disabled by the user, reflected in UI activity, or otherwise mitigated by Tor
+Browser
+
+In Tor Browser, we will likely close the connection after recieving some rate
+of unsolicitied PING or SETTINGS updates, and introduce delay or jitter before
+responding to these requests before that point. However, lack of explicit
+guidance in the specification about this issue raises concerns about what
+frequencies of these frames are likely to contitute attacks, or instead
+represent normal server behavior in the wild due to overly-aggressive HTTP/2
+implementations.
+
+\subsection{Future Traffic Analysis Resistance Enhancements for HTTP/3}
+
+In terms of assisting traffic analysis defenses, we would like to see
+capabilities for larger amounts of per-frame padding, and more fine-grained
+client-side control over frame sizes. Unfortunately the 256 bytes of padding
+provided by HTTP/2 is likely to be inconsequential when combined with a 16K
+frame size.
+
+In combination with researchers at the University of Leuven, the Tor Project
+has also developed a protocol and prototype implementation for communicating
+statistical schedules for asynchonous padding from Tor clients to Tor relays.
+The research community is currently in the process of evaluating the efficacy
+of this protocol against traffic fingerprinting and other traffic analysis
+attacks.
+
+Pending the results of this analysis, these padding commands could form the
+basis of new HTTP/3 frame commands for communicating more sophisticated (yet
+still traffic-bounded) padding schedules to HTTP/3 servers.
+
+% FIXME: Cite.
+
+\section{Tor Network Compatibility}
+
+Our final area of concern is continued compatibility of the Tor network with
+future versions of the HTTP protocol.
+
+It is our understanding that there is a desire for future versions of HTTP to
+move to a UDP transport layer so that reliability, congestion control, and
+client mobility will be more directly under control of the application layer.
+
+At present, the Tor Network is only capable of carrying TCP traffic. While we
+would like to support UDP traffic and indeed eventually transition the entire
+Tor network to our own datagram protocol with custom congestion and flow
+control, additional research is still needed to examine the anonymity
+implications associated with this transition. Our present estimate is that a
+full network transition to UDP is at least five years away.
+
+% FIXME: Site Murdoch's UDP study
+
+While it will be technically possible to support the transit of UDP inside our
+existing TCP overlay network without signficant anonymity risks within a
+year's time or sooner, it is unlikely that this level of support will be
+sufficient to warrant the use of a finely-tuned UDP version of HTTP rather
+than a TCP variant.
+
+We are also concerned that even with a full network transition to a datagram
+transport, it is likely that the congestion, flow, and reliability control of
+a UDP version of HTTP/3 may still end up performing poorly over higher-latency
+overlay networks such as ours. We are especially interested in ensuring that
+overlay networks are taken in to account in the design of any UDP-based future
+versions of HTTP, and would also prefer to retain the ability to use future
+HTTP versions over TCP, should the UDP implementations prove suboptimal for
+our use case.
+
+
+
+\bibliographystyle{plain} \bibliography{HTTP3}
\clearpage
\appendix
More information about the tor-commits
mailing list