[tor-bugs] #11573 [Onionoo]: Ponder using a database for Onionoo rather than keeping indexes in memory and contents on disk
Tor Bug Tracker & Wiki
blackhole at torproject.org
Mon Sep 15 13:24:06 UTC 2014
#11573: Ponder using a database for Onionoo rather than keeping indexes in memory
and contents on disk
-----------------------------+-----------------
Reporter: karsten | Owner:
Type: enhancement | Status: new
Priority: minor | Milestone:
Component: Onionoo | Version:
Resolution: | Keywords:
Actual Points: | Parent ID:
Points: |
-----------------------------+-----------------
Comment (by karsten):
Replying to [comment:7 iwakeh]:
> * It might be also interesting to look at the access log in terms of how
many request were there at a given instance in time.
I just did a very quick analysis of access logs. In particular, I looked
at number of requests coming in per minute. Here's the summary:
Min. : 539.0
1st Qu.: 728.5
Median : 781.0
Mean : 788.9
3rd Qu.: 849.0
Max. :1151.0
That's relatively stable load, I'd say.
> * Response times could be longer if the connection of the client is
slow, or not correctly terminated, or in case of too many concurrent
session threads (that's why I asked about access logs). Too many
concurrent sessions would slow down lookup and compile in addition.
Agreed that response times depend to a large extent on the client
connection. That's why I temporarily modified the code to separate
compiling the response from sending it.
Regarding concurrent sessions, I could imagine measuring that in the code:
whenever a new `RequestHandler` or `ResponseBuilder` instance is created
we increment a counter, and whenever an instance is released we decrement
that counter. For statistics, we print out what fraction of the time the
counter was at 0, 1, 2, etc. The question is just what we'd do with these
statistics. Maybe we should first resolve #13089 to see what options we'd
have to fine-tune the server, for example, by changing thread pool sizes
or something.
> * Maybe, also separate the lookup and prepare measurements "physically"
from the response times. Seperate lab-measurements for lookup and compile
could be easily compared to db solutions. Seperate response time
measuremnts in-vitro and in-lab might help identifying server issues.
These steps are already separated: lookup is what `RequestHandler` does
and compile+respond is what `ResponseBuilder` does. Should be pretty easy
to measure these steps separately in a lab setting.
> * Pondering the database question should not only be done b/c of
performance reasons, but also in the light of what Onionoo serves. Many
requests especially those using the history subtype are database
concerns.Future requests (like #13137) too. Databases are made for that;
Even with requests like #13137, we shouldn't give up the current model of
performing queries based on a small substeps of attributes and putting
together prepared response parts.
> - why reprogram the functionality? The reprogramming won't scale
forever, and, what could turn out more difficult, is the code maintenance.
> - A highly optimized "query" on a proprietary in-memory index, might
pose a problem when it needs to be extended/changed. An sql-query is way
more readable, b/c the database takes care of optimization (of course one
needs to define indexes and the like).
>
> So, maybe just think about using a database anyway.
Agreed. I'm mostly thinking that it's not a priority to do this, but I
don't question that using a database with good indexes is the better
design.
> In this light I would not opt for sqllite anymore, but start with a
reasonable gpl-db-system.
Traditionally, there's a preference for PostgreSQL at Tor. Our sysadmins
and devs have experience with that.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/11573#comment:8>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list