[tor-dev] SkypeMorph

Wed Mar 28 22:10:41 UTC 2012

On 12-03-28 02:28 AM, Roger Dingledine wrote:
> On Mon, Mar 26, 2012 at 03:04:47PM -0400, Hooman wrote:
>>> Can you give us some guesses about next steps for resolving these issues
>>> (or explaining why they aren't actually as worrisome as they appear)?
>>>
>>> A) It looks like the transport has no notion of adapting to network
>>> conditions, i.e. congestion control. So it will basically fall apart on
>>> a low-bandwidth or congested network.
>> True, but as mentioned in section 8.2 of the technical report, this
>> can be fixed by considering Skype video calls on different networks,
>> depending on the network status. (the way Skype bandwidth usage
>> varies with available bandwidth is studied, for example: http://www.tlc-networks.polito.it/oldsite/mellia/papers/skype_info08.pdf
>> )
> Isn't that like saying TCP congestion control can be implemented by
> sampling capacity and traffic load on a variety of networks, and then
> hard-coding the TCP window and resend algorithms to suit the network
> you think you're running on?
>
> I'm not worried here so much about whether your flow adapts to network
> conditions like a real Skype flow would (though I agree that's an
> issue). I'm worried about whether your flow would fail to back off at
> all in the face of congestion, leading to a) Skypemorph not getting its
> packets through because so many of them get dropped, and b) Skypemorph
> ruining the network it's running on.
>
>>> B) It sends at a constant rate of 43KB/s in each direction all the
>>> time. Even if users are willing to tolerate that, it doesn't scale on
>>> the bridge/relay side if there are lots of users. I wonder how feasible
>>> a "traffic shaping" approach would be (where the flow rate drops off
>>> if there's no underlying traffic), and how much that would screw with
>>> your statistics. Which leads to:
>> 43KB/s is per connection, so each client gets this bandwidth, while
>> the bridge can have multiple connections.
> Right. But if a bridge wants to handle 10 Skypemorph users, the bridge
> needs to be sending out 430KB/s all the time. That means volunteer users
> can't operate these bridges at home (unless they live in Japan, Korea,
> or Sweden I guess). It also greatly increases the overall traffic cost
> of running a bridge.
>
> For example, during the February weekend when Iran blocked SSL, my
> obfsproxy bridge was easily handling ~500 users at once. With Skypemorph
> that's 172mbit/s of duplex traffic?
I will answer the first two questions here: We are going to get this 
fixed. So as I mentioned, we are going to do what Skype does: We will 
use different levels of bandwidth for the output of the SkypeMorph 
depending on network status (we can detect this the same way TCP detects 
congestions) or the amount of bandwidth the bridge is willing to 
dedicate to each client. Another way to do this is to limit the 
bandwidth provided to each client, as the number of clients increases.
>>> C) The packet size and timing distributions only aim to match the
>>> first-order properties of Skype. At the same time, DPI vendors have
>>> already been in a battle with Skype traffic for a while now. How advanced
>>> do you think DPI vendors are at detecting Skype-like traffic, and thus at
>>> distinguishing your traffic from real Skype traffic? Similarly, how bad is
>>> it that you don't follow through with the TCP side of the Skype handshake?
>> The TCP connections are more of control connections and they send a
>> small number of messages during the call and we actually have some
>> ideas on how to deal with this, like handing the sockets for these
>> connections to our software after we fake a call.
> Ok.
>
> What do you think about the "first-order properties" question about size
> and timing (e.g. I bet real Skype traffic does not draw its packet size
> and timing independently from the size and timing of the previous packet)?
> Combined with the fact that DPI vendors have quite a bit of experience
> targeting Skype traffic in particular, I worry that they've thought
> about this specific question more than we have.
Yes, we can definitely go beyond first-order statistics. It should be 
fairly straight forward to do so.
>
>>> D) The morphing output is basically identical to the naive shaping. Are
>>> you sure you did it right?
>> So as mentioned in the report, the original traffic morphing does
>> not consider timing at all (which makes it less effective against
>> DPIs) and it aims at minimizing the overhead, ie the number of
>> padding bytes sent on the wire.
> Right. Minimizing padding bytes on the wire is a big reason to like it.
>
>> When we introduced the inter-packet
>> timing feature, it was no longer possible to go with the same
>> construction, since packets may not be send right away. As a result
>> we tried a different approach for traffic morphing: we buffered
>> packets received from Tor, then when it is time to send the next
>> packet, we simply estimate the original packet size by a sample form
>> the Tor's packet size distribution. I know there are other ways this
>> can be done, but in our experiment we didn't observe any tangible
>> difference in the outcome.
> Hrm. So that means your traffic morphing algorithm doesn't try to reduce
> padding bytes? That makes your graph 5 make more sense. But is it really
> accurate to call it morphing still? It would be great to explore that
> tradeoff more.
We called it SkypeMorph since we are still using the morphing matrix. 
Although, I personally believe we can find a way to minimize the amount 
of padding while keeping the timing and sizes statistically 
indistinguishable from that of Skype's, the traffic morphing technique 
greatly depends on the characteristics of the source protocol (Tor) and 
it's not easy to guess the timing patterns of user's behind Tor. So if 
we use traces from web-browsing behind Tor as the input to our software, 
and our client uses Tor for downloading multimedia content, in this case 
traffic morphing would not perform very well.
>
> --Roger
>
> _______________________________________________
> tor-dev mailing list
> tor-dev at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev