[tor-dev] Prototype Primitives for Website Traffic Fingerprinting defenses

Mike Perry mikeperry at torproject.org
Tue Jul 29 05:07:10 UTC 2014


Hello all,

What follows is a summary of the primitives that Marc Juarez aims to
implement for his Google Summer of Code project on prototyping defenses
for Website Traffic Fingerprinting and follow-on research.

After a review of Tamaraw[1], Adaptive Padding[2], CS-BuFLO[3], and the
Supersequence[4] work, as well as some discussion at the Tor dev
meeting, we came up with this list of padding message primitives.

These functions are meant to be sent as command cells from one endpoint
to the other. Some are bidirectional commands, others can only be sent
in one direction (client to relay, or relay to client).

For now, they will be implemented as ad-hoc messages between two
modified obfsproxy (called wfpadtools) instances. The source code for
this fork lives here for the moment:
https://bitbucket.org/mjuarezm/obfsproxy-wfpadtools/

Long term, I am thinking that all of these should be specified as if
they were their own RELAY_* cell type from tor-spec.txt:
https://gitweb.torproject.org/torspec.git/blob/HEAD:/tor-spec.txt#l1215

This means that padding would be a circuit-level property, and it would
be possible to send it to and from any hop in the circuit, due to
leaky-pipe topology (using the "recognized" field). In a world of very
cheap and excessive middle and Guard node bandwidth, we would run this
padding to the middle node. For the wfpadtools prototype, it will
obviously only cover the first hop.


Here's the list of primitives, broken down by research paper:

Generic Messages (common to all defenses):
 RELAY_IGNORE()
   Description:
     Simple fixed-length (CELL_SIZE) padding cell.
   Direction:
     Bidirectional (client to relay, relay to client)
   Parameters:
     None

 RELAY_SEND_PADDING(N,t)
   Description:
     Send the requested number of padding cells in response.
   Direction:
     Client to relay
   Parameters:
     N:
       Number of padding cells to send in response to this cell
     t:
       microseconds delay before sending

 RELAY_APP_HINT(session_id, status)
   Description:
     A hint from the application layer for session start/stop
   Direction:
     Client to relay
   Parameters:
     session_id:
       A string identifying the session (ie keyed hash of
       url bar domain)
     status:
       "Start" or "Stop", indicating session start and end.


Adaptive Padding Messages:
 RELAY_BURST_HISTOGRAM(histogram[], labels_ms[], remove_toks, fuzz, when)
   Description:
     Specifies a histogram that encodes a delay distribution
     representing the probability of sending a single padding packet
     after a given delay in response to either an upstream cell, or a
     client-originating cell.
   Direction:
     Client to relay
   Parameters:
     histogram[]:
       Contains delay distribution of sending an IGNORE packet
       after sending a real packet
     labels_ms[]:
       millisecond labels for the bins (with "Infinity"/"Ignore" bin to
       allow encoding the probability of not sending any padding packet in
       response to this packet).
     remove_toks:
       If true, follow Adaptive Padding token removal rules.
       If false, histograms are immutable.
     fuzz:
       If true, randomize the delay uniformly between bin labels.
       If false, use bin lables as exact delay values.
     when:
       If set to "receive", this histogram governs the probably of sending
       a padding packet after some delay in response to a packet
       originating from. If set to "send", this histogram governs padding
       packets that are transmitted after a packet arrives from upstream
       (the middle node). In both cases, the padding packet is sent in the
       direction of the client.

 RELAY_GAP_HISTOGRAM(histogram[], labels_ms[], remove_toks, fuzz, when)
   Description:
     Specifies a histogram that encodes a delay distribution
     representing the probability of sending a single additional padding
     packet after a given delay following a padding packet that
     originated at this hop. 
   Direction:
     Client to relay
   Parameters:
     histogram[]:
       Contains delay distribution of sending an IGNORE packet after sending
       an IGNORE packet
     labels_ms[]:
       millisecond labels for the bins (with "Infinity"/"Ignore" bin to
       allow encoding the probability of not sending any padding packet in
       response to this packet).
     remove_toks:
       If true, follow Adaptive Padding token removal rules.
       If false, histograms are immutable.
     fuzz:
       If true, randomize the delay uniformly between bin labels.
       If false, use bin lables as exact delay values.
     when:
       If "receive", this histogram applies to locally-inserted padding
       packets that were initially sent in response to client-originated
       data.  If "send", this histogram applies to packets sent in response
       to locally-inserted padding packets sent in response to upstream
       data. Note that this means that implementations must maintain this
       metadata as internal state as the system transitions from
       BURST_HISTOGRAM initiated padding into GAP_HISTOGRAM initiated
       padding. In both cases, the padding packet is sent in the direction
       of the client.


CS-BuFLO Messages:
 RELAY_TOTAL_PAD(session_id, t):
   Description:
     Requests that endpoint pad all batches to nearest 2^K cells total
     or until APP_HINT(session_id, stop)
   Direction:
     Client to relay
   Parameters:
     session_id:
       The session ID from APP_HINT()
     t:
       The number of microseconds to wait between cells to consider them
       part of the same batch.

 RELAY_PAYLOAD_PAD(session_id, t, R):
   XXX: The paper was not clear enough for me to understand this primitive


Tamaraw:
 RELAY_BATCH_PAD(session_id, L, t)
   Description:
     Requests that endpoint pad all batches of cells to L cells total
     or until APP_HINT(session_id, stop)
   Direction:
     Client to relay
   Parameters:
     session_id:
       The session ID from APP_HINT()
     L:
       The multiple of cells to pad to.
     t:
       The number of microseconds to wait between cells to consider them
       part of the same batch.



Comments, questions, and suggestions are welcome! In particular, I'm not
sure I got either the CS-Buflo or the tamaraw primitives completely
correct.



1. http://cacr.uwaterloo.ca/techreports/2013/cacr2013-30.pdf
2. http://freehaven.net/anonbib/cache/ShWa-Timing06.pdf
3. http://arxiv.org/abs/1401.6022
4. http://cacr.uwaterloo.ca/techreports/2014/cacr2014-05.pdf


-- 
Mike Perry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Digital signature
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20140728/cd466bab/attachment.sig>


More information about the tor-dev mailing list