[tor-dev] Scaling Tor Metrics

Thu Nov 26 19:53:58 UTC 2015

On 25 Nov (16:53:45), Karsten Loesing wrote:
> Hello devs,
> 
> the Tor Metrics website [0] claims to be "the primary place to learn
> interesting facts about the Tor network" and invites its visitors who
> "come across something that is missing" to contact the website authors
> about it.  That's a bold statement I put there! :)
> 
> Yet, there's considerable product backlog with possible enhancements
> [1] that doesn't seem to ever become shorter.  Even worse, it can be
> expected that the backlog will refill quickly once the community
> notices that feature requests are suddenly considered.  The main
> reason for this unfortunate situation is that Tor Metrics contains
> many moving parts, including some heavy database lifting that takes
> place below the surface, that all want to be maintained.  Adding more
> parts just makes the whole thing even more likely to break.  At the
> same time, knowing about the situation that Tor Metrics has become
> almost closed to contributions is painful.
> 
> This posting shall discuss possible solutions.  The goal is to let Tor
> Metrics grow in a healthy fashion that encourages contributions from
> the community.  These solutions are not mutually exclusive, and the
> best solution may use parts of more than one solution sketched out here.
> 
> 
>  1 Make Tor Metrics better and bigger, internally
> 
> The obvious solution is that the maintainers of Tor Metrics could just
> work harder to overcome the problems stated above.  Let's think this
> through.
> 
>  1.1 Add more development resources
> 
> If only the current Tor Metrics maintainers had more time to devote to
> cleaning up existing parts and to add new parts, that would solve our
> problem.  They could refactor parts that are hard to maintain, and
> they could work off the serious backlog that has piled up.  Of course,
> this means dropping or handing over responsibilities for other
> products, and it may mean finding (and paying) new developers to help
> maintain Tor Metrics.  It's unclear whether anything like this would
> fit into Tor's budget, and whether these changed priorities would make
> users of tools that had to be dropped or handed over unhappy.
> 
>  1.2 Rewrite internal parts of Tor Metrics to encourage external
> contributions
> 
> Most of Tor Metrics would have run 10 or 15 years ago with only minor
> modifications.  It's not necessarily a bad thing to use established
> technologies.  But maybe, if we rewrite it using modern
> data-processing, web, and visualization frameworks, it becomes more
> attractive to other developers to contribute code and help maintain
> existing (well, then rewritten) code.  The result would be a larger
> Tor Metrics website that is easier to maintain and hopefully
> maintained by more people.  It's unclear how realistic this plan is,
> though, and it requires attention by Tor Metrics maintainers to bring
> it enough into shape for external contributors to get involved.
> 

I'm not 100% familiar with the whole process of adding a graph to metrics but
I know a bit about the needed Java code and data source setup. In my case,
about the graphs I do work with (see http://ygzf7uqcusp4ayjs.onion), I decided
to go with Munin for two reasons. First of all, the data source for those
graphs are on different machines (3 different for now) and munin offers a
_super_ easy way to have remote node where the server just learns what has
been deployed, gets the data out of it and auto-graph without any added
configuration. Second reason is that I can use whatever language I want to
generate those data points. In my case, I use stem extensivelly with Python.

So two things to consider here:

1) _easy_ way to add and deploy new graphs. By that I mean not requiring half
a day from a metrics.tpo maintainer.

2) Have a way where the data source collection is decoupled from the graphing
mechanism. I think metrics is quite good for that where it pulls CSV from
collector.tpo (?) and then some Java/R programs graph it and generates an html
page. I think Onionoo is a good tool in that direction (data source).

If we can get that "Java/R" step into an auto discovery way like Munin does or
very simple one liner in a config file or a new script in a directory, it
would be amazing. Furthermore, if a super epic graph developer wants to
contribute, having a way to run metrics.tpo framework locally on a dev machine
so it's easy to test would be even more epic.

There are plenty of tools nowadays that can help us do that without
reinventing all the things. Food for thoughts :).

>  2 Add more ways to contribute to Tor Metrics externally
> 
> It may be possible to further grow Tor Metrics without adding more
> code to it, hence not making it any harder to maintain.  However, if
> code to generate visualizations is run elsewhere, there's a certain
> risk that results are not perceived as trustworthy as if that code
> were run as part of Metrics.  This is primarily a problem of setting
> user expectations right.  We could add different ways for contributing
> to Tor Metrics, depending on the level of commitment that contributors
> are willing to make.  Possible new ways (in addition to filing a Trac
> ticket, which is already possible, though not very effective) are:

I would always have graph generated on the metrics.tpo side. The data source
for the graph though could be a remote machine but then you end up in the
"security/authentication/trust" nightmare :S ...

If the entry bar for new graphs is super low that is technically very easy to
add a new one (both data source and graph) then someone could submit (trac
ticket) a new visualization and then the metrics team reviews it and merge.

Adding a graph as a "patch" would greatly help avoid more work on the metric
team, but it need to be easy, documented and not a complicated framework to
run (or at least test that the graph works for metrics.tpo).

> 
>  2.1 Accept contribution of static data or static graphs
> 
> Somebody might contribute data (in a tarball, download link, etc.) or
> a static graph (static as in "doesn't break, ever", not "static HTML
> with a tiny amount of JavaScript that will surely never break").  The
> Tor Metrics team reviews that and puts it on the Tor Metrics website,
> together with a short description, author information, license, etc.
> There are plenty of visualizations on Trac and on the mailing lists,
> so we'll have to define criteria what we add and what not, and we'll
> need a good process for making that happen.

+1.

> 
>  2.2 Link to external websites
> 
> Somebody might write a website that visualizes Tor network data.  The
> Tor Metrics team reviews the idea behind it, but not necessarily look
> at its code, and adds an external link to Tor Metrics.  It becomes
> obvious that the authors remain responsible for their visualization,
> so there's no risk involved for Tor Metrics, but users may not trust
> it as much, because it doesn't have the Tor Metrics label.  Note that
> we're already doing this approach by linking to the visualizations
> showing "Tor users as percentage of larger Internet population" [2]
> and "Data flow in the Tor network" [3].  Also note that we could as
> well have hosted the former directly on Tor Metrics with appropriate
> attribution, because it's a static image.  This is not the case with
> the latter.

It comes down to trust here I would say. Like George said in his previous
email, we always have the luxury of removing the link if some crazy shit
appears after a while but also it could be a sneaky way to deliver malware to
users :).

So I would argue to put our effort into making metrics contributions so easy
that we should only link to external websites for insane stuff like
https://torflow.uncharted.software (from which we helped them).

> 
>  2.3 Run an externally developed website as if it were part of Tor Metrics
> 
> Let's imagine that somebody produces a visualization of Tor network
> data and would like to make it part of Tor Metrics but without
> limiting themselves to the technology used by Tor Metrics.  We could
> let them write their visualization as website and integrate it into
> Tor Metrics after reviewing its code.
> 
> Technically, part of this integration would be to "redress" the
> website by applying the Tor Metrics design (which has lots of room for
> improvement, but let's just say the result will look as seamlessly
> integrated into Tor Metrics as the "Network bubble graphs" [4]).
> Another part would probably be to rewrite web requests, so that users
> still think they're talking to https://metrics.torproject.org/, but
> really they're talking to another webserver behind that.
> 
> Regarding hosting and maintenance, in theory, the website could be
> hosted by the original creators, but that effectively means that the
> Tor Metrics team gives up part of the control about what's on the Tor
> Metrics website.  The creators of the external website could change
> parts or add new parts that wouldn't be reviewed by Tor Metrics
> developers, but they would be perceived as part of Metrics, which
> seems bad.  The Tor Metrics team could run the externally developed
> website on a separate host or on the same host as Tor Metrics.  We
> could imagine variants where the original creator stays around to fix
> any issues as they come up, or we could imagine that they donate their
> visualization that the Tor Metrics people will then maintain.  We
> could even imagine that the Tor Metrics maintainers some day decide to
> integrate the originally external website into Tor Metrics proper, but
> that would not be required for this model to work.

It goes back a bit to the third part discussion above.

> 
> 
> All these ideas require writing down guidelines, criteria, and
> processes.  In particular, they require more thoughts and input from
> other people who are not currently involved in Tor Metrics maintenance
> and who can be expected more objective.  And once these ideas are
> implemented, we'll need more Tor Metrics maintainer than just one.

I would be very interested in people actually using/developing visualization
tools nowadays and how we could make a transition to something much more fit
for external contributions.

What about also a blog post on all of this?

Cheers!
David

> 
> What are your thoughts?
> 
> All the best,
> Karsten
> 
> 
> [0] https://metrics.torproject.org/
> 
> [1]
> https://trac.torproject.org/projects/tor/query?status=!closed&component=Metrics
> 
> [2] https://metrics.torproject.org/oxford-anonymous-internet.html
> 
> [3] https://metrics.torproject.org/uncharted-data-flow.html
> 
> [4] https://metrics.torproject.org/bubbles.html
> 
> _______________________________________________
> tor-dev mailing list
> tor-dev at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 603 bytes
Desc: Digital signature
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20151126/e17d5f3a/attachment.sig>