[tor-bugs] #11573 [Onionoo]: Ponder using a database for Onionoo rather than keeping indexes in memory and contents on disk

Mon Sep 15 13:24:06 UTC 2014

#11573: Ponder using a database for Onionoo rather than keeping indexes in memory
and contents on disk
-----------------------------+-----------------
     Reporter:  karsten      |      Owner:
         Type:  enhancement  |     Status:  new
     Priority:  minor        |  Milestone:
    Component:  Onionoo      |    Version:
   Resolution:               |   Keywords:
Actual Points:               |  Parent ID:
       Points:               |
-----------------------------+-----------------

Comment (by karsten):

 Replying to [comment:7 iwakeh]:
 > * It might be also interesting to look at the access log in terms of how
 many request were there at a given instance in time.

 I just did a very quick analysis of access logs.  In particular, I looked
 at number of requests coming in per minute.  Here's the summary:

  Min.   : 539.0
  1st Qu.: 728.5
  Median : 781.0
  Mean   : 788.9
  3rd Qu.: 849.0
  Max.   :1151.0

 That's relatively stable load, I'd say.

 > * Response times could be longer if the connection of the client is
 slow, or not correctly terminated, or in case of too many concurrent
 session threads (that's why I asked about access logs). Too many
 concurrent sessions would slow down lookup and compile in addition.

 Agreed that response times depend to a large extent on the client
 connection.  That's why I temporarily modified the code to separate
 compiling the response from sending it.

 Regarding concurrent sessions, I could imagine measuring that in the code:
 whenever a new `RequestHandler` or `ResponseBuilder` instance is created
 we increment a counter, and whenever an instance is released we decrement
 that counter.  For statistics, we print out what fraction of the time the
 counter was at 0, 1, 2, etc.  The question is just what we'd do with these
 statistics.  Maybe we should first resolve #13089 to see what options we'd
 have to fine-tune the server, for example, by changing thread pool sizes
 or something.

 > * Maybe, also separate the lookup and prepare measurements "physically"
 from the response times. Seperate lab-measurements for lookup and compile
 could be easily compared to db solutions. Seperate response time
 measuremnts in-vitro and in-lab might help identifying server issues.

 These steps are already separated: lookup is what `RequestHandler` does
 and compile+respond is what `ResponseBuilder` does.  Should be pretty easy
 to measure these steps separately in a lab setting.

 > * Pondering the database question should not only be done b/c of
 performance reasons, but also in the light of what Onionoo serves. Many
 requests especially those using the history subtype are database
 concerns.Future requests (like #13137) too. Databases are made for that;

 Even with requests like #13137, we shouldn't give up the current model of
 performing queries based on a small substeps of attributes and putting
 together prepared response parts.

 >  - why reprogram the functionality? The reprogramming won't scale
 forever, and, what could turn out more difficult, is the code maintenance.
 >  - A highly optimized "query" on a proprietary in-memory index, might
 pose a problem when it needs to be extended/changed. An sql-query is way
 more readable, b/c the database takes care of optimization (of course one
 needs to define indexes and the like).
 >
 > So, maybe just think about using a database anyway.

 Agreed.  I'm mostly thinking that it's not a priority to do this, but I
 don't question that using a database with good indexes is the better
 design.

 > In this light I would not opt for sqllite anymore, but start with a
 reasonable gpl-db-system.

 Traditionally, there's a preference for PostgreSQL at Tor.  Our sysadmins
 and devs have experience with that.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/11573#comment:8>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online