[tor-dev] [tor-talk] Client simulation

Fri Jun 7 06:37:45 UTC 2013

(Sorry for cross-posting, but I think this is a topic for tor-dev@, not
tor-talk at .  If you agree, please reply on tor-dev@ only.  tor-talk@
people can follow the thread here:

https://lists.torproject.org/pipermail/tor-dev/2013-June/thread.html)

On 6/6/13 7:32 PM, Norman Danner wrote:
> I have two questions regarding a possible research project.
> 
> First, the research question:  can one use machine-learning techniques
> to construct a model of Tor client behavior?  Or in a more general form:
>  can one use <fill-in-the-blank> to construct a model of Tor client
> behavior?  A student of mine did some work on this over the last year,
> and the results are encouraging, though not strong enough to do anything
> with yet.
> 
> Second, the meta-question:  is it worthwhile to answer the first
> question?  It seems to me that if the answer to the first question is
> "yes," then the solution could be used to (at least) provide better
> simulations of Tor (e.g., via Shadow or ExperimenTor).  This possibly
> naive thought would imply that the answer to the second question is "yes."
> 
> I'd be interested to hear responses to my second question, either
> validating my naive thought or explaining why the first question isn't
> worth answering.  I'd accept responses to my first question, too, in
> case this has already been done.

Hi Norman,

yes, it's worthwhile to answer this question!  I can imagine how at
least Shadow and the Tor path generator would benefit from better client
models.  User number estimates on the metrics website might benefit from
them, too.

I found two tickets where we asked similar questions before, and maybe
there are more tickets like these:

https://trac.torproject.org/projects/tor/ticket/2963

https://trac.torproject.org/projects/tor/ticket/6295

Some very early thoughts:

- How do we make sure that we ask a representative set of people to
instrument their clients and export data on their usage behavior?  If we
only ask people who read their favorite news site twice per day, our
client model will be just that, but not representative for all Tor
users.  (Still, we would know more than we know now.)

- Can we somehow aggregate usage information enough to make it safe for
people to send actual usage reports to us?  I could imagine having a
torrc flag that is disabled by default and that, when enabled, writes
sanitized usage information to disk.  For this we need a very good idea
what we're planning to do with the data, and we'll need to specify the
aggregation approach in a tech report and get it reviewed by the community.

Are your student's results available somewhere?

Best,
Karsten