[tor-dev] Torperf implementation considerations (was: Torperf)

Tue Sep 17 10:48:50 UTC 2013

On 9/17/13 3:33 AM, Kevin Butler wrote:
> [cc tor-dev]
> 
> On 16 September 2013 09:47, Karsten Loesing <karsten at torproject.org> wrote:
> 
>> Hmm, I don't think the HTTP client/server part is the right interface to
>> write another client and server and call it Torperf compatible.  The
>> Torperf data API would be a better interface for that: people could
>> write their own experiments and provide data that is Torperf compatible,
>> or they could use the data that Torperf provides and analyze or
>> visualize it in a better way.  But writing only half of an experiment,
>> client or server, wouldn't be of much use.
>>
>>
> I thought bit more about this and I don't fully agree (but you've nailed it
> that the data is the core api). I've come to the conclusion that I think it
> makes sense for TorPerf to do much of the heavy lifting and the core server
> aspects, but I think the experiments should be decoupled in such a way as
> to allow for flexible client implementations.

Ah, let me clarify what I meant above: splitting the client part and the
server part of an experiment doesn't seem of much use to me.  For
example, the HTTP/SOCKS client that fetches static files and the HTTP
server that serves those files shouldn't be distributed to two code
repositories or packages.  Because if either part changes, the other
part needs to be changed, too.

But I totally agree with you that it should be easy to add new
experiments to Torperf.  When I mentioned the data API, my idea was that
somebody writes their own Torperf and provides data in a format that our
Torperf understands, or that somebody takes our results and does neat
stuff with them.

Of course, another way to allow for adding new experiments is to define
a clear interface for extending our Torperf to support them.  That's
what you have in mind, I think.

> Below are a bunch of semi related musings on how to get to this:

Without going into the details, there are some great ideas below!

Can you help me add some structure to your ideas by adding them to the
appropriate sections of the design document?  Can you clone the Git
repo, edit the .tex file, commit your changes, run git format-patch
HEAD^, and send me the output?  Here's the repository:

https://gitweb.torproject.org/user/karsten/tech-reports.git, branch torperf2

A few quick comments:

> Of course as many experiments will just be doing simple http requests, the
> experiments will really just be wrapper scripts around a bundled default
> client implementation which would be similar to the one in your perfd
> branch.
> 
> Specifically, to aid in this, I'd propose something like the folder
> structure below: https://etherpad.mozilla.org/iqrgueVFd6
> 
> [Why a set of directories? There's not a solid reason other than it forces
> unique names and gives good opportunity to isolate experiment specific
> data. If an experiment has no specific data files it should be alright to
> just have the config file instead of a directory. A directory structure
> also mimics the idea of a restful web service as mentioned in the pdf, i.e.
> the user could easily know to go to http:/.../results/myexperiment/ to see
> results filtered for that single experiment. Either way, it's a minor
> detail, I just feel it's easier for a user.]

I like the idea of configuration directories.

> A good use of the specific experiment data could be for static files, where
> anything in an experiments '/public' folder would be served by the
> webserver while that experiment is running. If wanting to ensure nothing
> gets proxy cached, the experiment could be responsible for generating
> random data files for each run. (This is a clear separation of server vs
> experiment implementation concern.) Another idea could be custom experiment
> views(js probably) that would transform the experiment results when viewed
> on the web dashboard.
> 
> The experiments should not be responsible for determining details such as
> socks port or control port, the TorPerf service should deal with load
> balancing a bunch of tor instances on different ports and just tell an
> experiment 'Do your stuff using this sock port, this public ip:port,
> etc...' via environment variables. (The config could ask that certain
> aspects of the torrc file are setup specifically but managing ports and
> stuff is just asking for user error unless there's some experiment that
> requires it.)
> 
> The config should minimally have an execution rate and a command. The
> command could be to just execute the bundled TorPerf client implementation
> with boring parameters e.g. just fetch facebook.com and record the normal
> timings that the default implementation tracks.
> 
> A more interesting config's command could be the alexa_top_100 where it's
> just a basic script/makefile that fetches a new alexa list if the one in
> the local folder is older than X days and then for each site in the list it
> runs the TorPerf client instance.
> 
> The TorPerf service instance should be able to run experiments by just
> executing the commands and recording whatever is written to stdout and
> stderr. After the command exits, if it's non zero then the stderr output is
> treated as error messages while if it's zero then it's treated as info
> messages. The stdout data should be treated as TorPerf results and it's an
> error if it's not well formed. If it's not well formed it should be
> captured in the results file for debug purposes. An example of
> informational output might be that the alexa_top_100 experiment updated
> it's list before this result set. [On the web interface it should be clear
> which results sets contained errors or information]

Executing scripts and reading stdout/stderr is probably too low-level.
I think we need a Python/Twisted (or whatever language Torperf will be
written in) interface for running an experiment and retrieving results.

> Once the experiment finishes, the service should postprocess the results
> from the experiment and replace any fingerprinted entries (This needs to be
> well defined) with the server side timing information for that specific
> fingerprint. Then the server should store the results file in a timestamped
> file (timestamped probably by experiment start time) and update it's
> database(if there is one).
> 
> The experiments should be able to specify their required tor version in
> their config, but it should accept placeholder values such as the default
> 'latest', 'latest-stable' or even 'latest(3)' which would run the
> experiment for all 3 of the latest tor versions. I think the ability to
> check the performance of the same experiment over multiple Tor versions
> could be interesting, especially to determine if any one build has caused
> anomalies in performance. I would expect very few experiments to run across
> multiple versions though.
> 
> Additionally, something that could be neat, but it's not clearly in the
> requirements, should TorPerf be responsible for notifying the user when
> there are new clients available to run as latest? Would it be useful to be
> able to specify that some experiments should be run on 'master' or a gitref
> and that it would be pulled between runs? That's probably not practical.
> 
> Apologies for the length and lack of order!

Well, thanks for your input!  As I said above, it would help a lot if
you added these ideas to the appropriate sections of the design document.

Thanks in advance!

All the best,
Karsten