[tor-dev] Client simulation
Norman Danner
ndanner at wesleyan.edu
Tue Jun 11 15:50:35 UTC 2013
On 6/10/13 4:40 AM, Karsten Loesing wrote:
>>> On 6/6/13 7:32 PM, Norman Danner wrote:
>>>> I have two questions regarding a possible research project.
>>>>
>>>> First, the research question: can one use machine-learning techniques
>>>> to construct a model of Tor client behavior? Or in a more general form:
>>>> can one use <fill-in-the-blank> to construct a model of Tor client
>>>> behavior? A student of mine did some work on this over the last year,
>>>> and the results are encouraging, though not strong enough to do anything
>>>> with yet.
>>
>> The intent is that each cluster (represented by a single hidden Markov
>> model) represents a "type" of client, even though we don't know for sure
>> what that client type does. We can make some guesses about some: the
>> "type" of steady high-volume cell counts is probably a bulk downloader;
>> the "type" of steady zero cell counts is probably an unused circuit;
>> etc. But in some sense, I'm thinking that what counts is the behavior
>> of the client, not the reason for that behavior. We don't have to
>> instrument clients for this. Of course, then one has to ask whether
>> this kind of modeling is in fact useful. It is somewhat different than
>> what you are envisioning, I think.
>>
>> There are about a billion variations (at last count) on this theme. We
>> chose one particular one as a test case to play with the methodology. I
>> think the methodology is mostly OK, though I'm not completely satisfied
>> with the results of the particular variation Julian worked on. So now
>> I'm trying to figure out whether to push this forward and in particular
>> what directions and end goals would be useful.
>
> Interesting stuff! You're indeed taking a different approach than I
> were envisioning by gathering data on a single guard rather than on a
> set of volunteering clients. Both approaches have their pros and cons,
> but I think your approach leads to some interesting results and can be
> done in a privacy-preserving fashion.
>
> Two thoughts:
>
> - I could imagine that your results are quite valuable for modeling
> better Shadow/ExperimenTor clients or for deriving better client models
> for Tor path simulators. Maybe Julian's thesis already has some good
> data for that, or maybe we'll have to repeat the experiment in a
> slightly different setting. I'm cc'ing Rob (the Shadow author) and
> Aaron (working on a path simulator) to make sure they saw this thread.
> I can help by reviewing code changes to Tor to make sure data is
> gathered in a privacy-preserving way, and I'd appreciate if those code
> changes would be made public together with analysis results.
I'm in the process of rewriting the data collection code, and will
e-mail later with some of the details. But maybe off-list initially, as
I think the first few passes will be very special-purpose and hence not
of general interest (though I'm happy to discuss it more publicly if
that's more appropriate).
Right now I'm considering focusing on trying to get a reasonable
(partial) answer to the following question: how well do various
timing-analysis attacks actually work? That is, how well do they work
when the client model is "accurate?" I'm not even sure how exactly to
define "accurate," though I can think of at least a few different ways.
But I'm hoping that by focusing on a relatively narrow question, we
can see manageable chunks of questions related to what kinds of data can
be reasonably collected, and how can we use that data for other purposes.
> - It might be interesting to observe how Tor usage changes over time.
> Maybe the research experiment leads to a set of classifiers telling us
> when a circuit is most likely used for bulk downloads, used for web
> browsing, used for IRC, unused, or whatever. We could then extend
> circuit statistics to have all relays report aggregate data of how
> circuits can be classified. Requires a proposal and code, but I could
> help with those.
Yes, I can see a number of longer-range applications like this. I'm not
sure I want to think about proposals and code just yet.
- Norman
--
Norman Danner - ndanner at wesleyan.edu - http://ndanner.web.wesleyan.edu
Department of Mathematics and Computer Science - Wesleyan University
More information about the tor-dev
mailing list