[tor-bugs] #2923 [Metrics/Metrics website]: Improve materialized views in the metrics database
Tor Bug Tracker & Wiki
blackhole at torproject.org
Thu Jul 21 16:18:54 UTC 2016
#2923: Improve materialized views in the metrics database
-------------------------------------+--------------------------
Reporter: karsten | Owner:
Type: enhancement | Status: assigned
Priority: High | Milestone:
Component: Metrics/Metrics website | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-------------------------------------+--------------------------
Changes (by iwakeh):
* severity: => Normal
Old description:
> The [https://gitweb.torproject.org/metrics-
> web.git/blob/HEAD:/db/tordir.sql metrics database schema] uses
> periodically updated tables similar to materialized views for aggregating
> statistics. When inserting data into the database, we write the dates
> that have changed to a separate updates table. Every three hours, we
> delete the aggregates for these days and recompute them, which takes a
> few minutes.
>
> The recompute step that takes most of the time is `refresh_user_stats()`,
> which is no surprise given the complexity of that function. We should
> try to simplify this function, possibly by pre-computing partial results
> that can be reused for other statistics. Ideally, recomputing aggregates
> should run in under one minute, given that we want to add more
> materialized views for more aggregate statistics in the future. In
> particular, I'd like to know which particular SQL parts slow us down in
> order to avoid them in the future.
New description:
The [https://gitweb.torproject.org/metrics-
web.git/tree/modules/legacy/db/tordir.sql metrics database schema] uses
periodically updated tables similar to materialized views for aggregating
statistics. When inserting data into the database, we write the dates
that have changed to a separate updates table. Every three hours, we
delete the aggregates for these days and recompute them, which takes a few
minutes.
The recompute step that takes most of the time is `refresh_user_stats()`,
which is no surprise given the complexity of that function. We should try
to simplify this function, possibly by pre-computing partial results that
can be reused for other statistics. Ideally, recomputing aggregates
should run in under one minute, given that we want to add more
materialized views for more aggregate statistics in the future. In
particular, I'd like to know which particular SQL parts slow us down in
order to avoid them in the future.
--
Comment:
If this is still a problem I'd suggest using postgres analysis options in
order to find out if adding an index here and there would help.
(Corrected the SQL file link in the description.)
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2923#comment:3>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list