[tor-dev] [GSoC 2013] Status report - Searchable metrics archive
Kostas Jakeliunas
kostas at jakeliunas.com
Sat Aug 10 19:28:44 UTC 2013
Hello,
another busy benchmarking + profiling period for database querying, but
this time more rigorous and awesome.
* wrote a generic query analyzer which logs query statements, EXPLAIN,
ANALYZE, spots and informs of particular queries that yield inefficient
query plans;
* wrote a very simple but rather exhaustive profiler (using python's
cProfile) which logs query times, function calls, etc.; output is used to
see which parts of the e.g. backend are slow during API calls; output can
be easily used to construct a general query 'profile' for a particular
database, etc.; [1]
* benchmarked lots of different queries using these tools, recorded query
times, was able to observe deviations/discrepancies;
* uploaded the whole database and benchmarked briefly on an amazon EC2
m2.2xlarge instance;
* concluded that, provided there is enough memory to cache *and hold* the
indexes in cache, query times are good;
* in particular, tested the following query scheme extensively: [2] (see
comments there as well if curious); concluded that it runs well;
* opted for testing raw SQL queries (from within Flask/python) - so far,
translating them into ORM queries (while being careful) resulted in
degraded performance; if we have to end up using raw SQL, I will create a
way to encapsulate them nicely;
* made sure data importing is not slowed and remains a quick-enough
procedure;
* researched PostgreSQL stuff, especially its two-layer caching; I now
have an understanding of the way pgsql caches things in memory, how
statistics on index usage are gathered and used for maintaining
buffer_cache, etc.
The searchable metrics archive would work best when all of its indexes are
kept in memory.
* to this end, looked into buffer cache hibernation [3], etc.; I think
pg_prewarm [4, 5] would serve our purpose well. (Apparently many
business/etc. solutions do find cache prewarming relevant - pity it's not
supported in stock PostgreSQL.)
The latter means that
* I don't think we can avoid using certain postgresql extensions (if only
one) - which means that deploying will always take more than apt-get && pip
install, but I believe it is needed;
* next on my agenda is testing pg_prewarm on EC2 and, hopefully, putting
our beloved database bottleneck problem to rest.
I planned to expose the EC2 for public tor-dev inquiry (and ended up
delaying status report yet again), but I'll have to do this separately.
This is possible, however. Sorry for the delayed report.
##
More generally,
I'm happy with my queer queries [2] now;
the two constraints/goals of
* being able to run Onionoo-like queries on the whole descriptor / status
entry database
* being able to get a list of status entries for a particular relay
will hopefully be put to rest very soon. The former is done, provided I
have no trouble setting up a database index precaching system (which will
ensure that all queries of the same syntax/scheme run quick enough.)
Overall, I'm spending a bit too much time on a specific problem, but at
least I have a more intimate lower-level knowledge of PostgreSQL, which
turns out to be very relevant to this project. I hope to be able to soon
move to extending Onionoo support and providing a clean API for getting
lists of consensuses in which a particular relay was present. And maybe
start with the frontend. :)
Kostas.
[1]:
https://github.com/wfn/torsearch/commit/8e6f16a07c40f7806e98e9c71c1ce0f8e3849911
[2]: https://github.com/wfn/torsearch/blob/master/misc/nested_join.sql
[3]:
http://postgresql.1045698.n5.nabble.com/patch-for-new-feature-Buffer-Cache-Hibernation-td4370109.html
[4]:
http://www.postgresql.org/message-id/CA+TgmobRrRxCO+t6gcQrw_dJw+Uf9ZEdwf9beJnu+RB5TEBjEw@mail.gmail.com
[5]: http://raghavt.blogspot.com/2012/04/caching-in-postgresql.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20130810/5f5e913a/attachment.html>
More information about the tor-dev
mailing list