[tor-bugs] #2921 [Metrics]: Improve bulk import of relay descriptors into metrics database
Tor Bug Tracker & Wiki
torproject-admin at torproject.org
Fri Apr 15 10:29:19 UTC 2011
#2921: Improve bulk import of relay descriptors into metrics database
-------------------------+--------------------------------------------------
Reporter: karsten | Owner: karsten
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Metrics | Version:
Keywords: | Parent:
Points: | Actualpoints:
-------------------------+--------------------------------------------------
We currently have two ways to import relay descriptors into the metrics
database:
- JDBC import: We have a Java importer that connects to the metrics
database via JDBC. We use a few tweaks like committing batches of up to
500 rows, but importing months of data is still a time-consuming task.
- psql \copy: The Java importer can be configured to parse relay
descriptor files and write files for psql's \copy command. The
disadvantage is that \copy cannot handle duplicates very well, so that we
have to pre-process the bulk import files.
I wonder if there are better approaches than these two, or if there are
improvements to how we implement them. It would be good to compare the
performance of these two approaches and any improvements to them for 1
(12, 24) months of data.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2921>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list