[tor-dev] All the problems about Stegotorous

Sun Jan 13 18:04:23 UTC 2013

On Thu, Jan 10, 2013 at 12:18:17PM -0700, vmonmoonshine at gmail.com wrote:
> I was talking to Roger yesterday on the IRC, and he mentioned that
> "[S]tegotorus ... has a whole lot of problems". I have heard this many
> times in different forms by now (in Florence, The sponsor F discussion, 
> etc). But I never saw these "lot of problems" are broken down in a list,
> so at least one can attack them one by one. It was always "lots of problem".
> 
> So, let this email be an appeal to all of you who have some problem,
> deficiency, architectural dissatisfaction, etc with Stegotorus, write
> back on this thread, so at least we have written account of these
> problems and dissatisfactions? 

Hi vmon,

Thanks for starting the thread.

My main issues with Stegotorus currently are more on the research (ok,
maybe it's better called design) side.

1) Like the FTE paper
(https://www.torproject.org/docs/pluggable-transports), the main
contribution of Stegotorus is to provide a framework for plugging in steg
modules. There are several example steg modules to choose from. The idea
is that even if the ones they offer now aren't suitable, if you *had*
a good one, you could just pop it in. The trouble is that I don't know
of any good ones, and I think that's a harder problem than people think.

2) There also remains the issue of where you get your covertexts. While
FTE says "we will build a brilliant regexp to characterize the format
of the thing we hide our content in" (which has its own problems --
anything your regexp misses is a crack in the armor), Stegotorus says
"we will build a big library of example things, by crawling the Internet,
and then we'll hide our content in them". Where does this library come
from? How does every Stegotorus bridge gets its own library? What happens
when you reuse an item in your library? How do *clients* generate their
own library? I think there are lots of ways to lose plausibility that
haven't been explored.

2') One of the proposed ways for clients to generate their library of
plausible covertexts is to basically wiretap the user and then replay
her own traffic later with the Tor flow embedded in it. First there
are messy engineering questions to tapping the user in a portable way;
but I worry even more about the privacy issues introduced by repeating
earlier traffic. Also, does it introduce new distinguishing attacks,
like "look for variations on the same request"? I recognize that *not*
using real client traffic also allows problems, e.g. "why is that user,
who usually uses IE, sending a user-agent of chromium?"

3) What's the overhead of putting your Tor traffic through each of the
steg modules? It's my understanding that some of the Stegotorus steg
modules produce immense size overhead (since the cover-item is large,
and the part of the cover-item you can hide your message in is relatively
small). What are the numbers for the current steg modules that people are
talking about / have built? Is there some correlation between inefficiency
(overhead) and plausibility (indistinguishability)? What are the tradeoffs
if we adopt some sort of "choose the covertext from your library that
minimizes your overhead" policy?

4) And then the last issue isn't so much a design issue as a community or
resource issue -- Zack is busy being a student, and further development
by SRI is complicated by their pub review requirement (which alas applies
to their code contributions too).

I think having some thorough explorations of 1-3 would put us in a much
better position.

--Roger