[tor-dev] Python ExoneraTor

Karsten Loesing karsten at torproject.org
Tue Jun 10 07:38:00 UTC 2014


On 10/06/14 05:41, Damian Johnson wrote:
>>> let me make one remark about optimizing Postgres defaults: I wrote quite
>>> a few database queries in the past, and some of them perform horribly
>>> (relay search) whereas others perform really well (ExoneraTor).  I
>>> believe that the majority of performance gains can be achieved by
>>> designing good tables, indexes, and queries.  Only as a last resort we
>>> should consider optimizing the Postgres defaults.
>>>
>>> You realize that a searchable descriptor archives focuses much more on
>>> database optimization than the ExoneraTor rewrite from Java to Python
>>> (which would leave the database untouched)?
>>
>> Are other datastore models such as splunk or MongoDB useful?
>> [splunk has a free yet proprietary limited binary... those having
>> historical woes and takebacks, mentioned just for example here.]
> 
> Earlier I mentioned the idea of Dynamo. Unless I'm mistaken this lends
> itself pretty naturally to addresses as a hash key, and descriptor
> dates as the range key. Lookups would then be O(log(n)) where n is the
> total number of descriptors an address has published (... that is to
> say very, very quick).
> 
> This would be a fun project to give Boto a try. *sigh*... there really
> should be more hours in the day...

Quoting my reply to Damian to a similar question earlier in the thread:

> I'm wary about moving to another database, especially NoSQL ones and/or cloud-based ones.  They don't magically make things faster, and Postgres is something I understand quite well by now. [...] Not saying that DymanoDB can't be the better choice, but switching the database is not a priority for me.

If somebody wants to give, say, MongoDB a try, I'd be interested in
seeing the performance comparison to the current Postgres schema.  When
you do, please consider all three search_* functions that the current
schema offers, including searches for other IPv4 addresses in the same
/24 and other IPv6 addresses in the same /48.

All the best,
Karsten



More information about the tor-dev mailing list