[tor-bugs] #6471 [Metrics Utilities]: Design file format and Python/Java library for multiple GeoIP or AS databases (was: Design file format and Python library for multiple GeoIP or AS databases)
Tor Bug Tracker & Wiki
torproject-admin at torproject.org
Sat Sep 8 20:30:54 UTC 2012
#6471: Design file format and Python/Java library for multiple GeoIP or AS
databases
-------------------------------+--------------------------------------------
Reporter: karsten | Owner: karsten
Type: enhancement | Status: needs_review
Priority: normal | Milestone:
Component: Metrics Utilities | Version:
Keywords: | Parent:
Points: | Actualpoints:
-------------------------------+--------------------------------------------
Changes (by karsten):
* status: assigned => needs_review
Comment:
Replying to [comment:8 rransom]:
> #2506 has a link to possibly-useful Python code.
Maybe, yes. Interesting discussion there! The focus is somewhat
different though. I'd sacrifice compact representation of the data
structure holding multiple GeoIP databases for better lookup speed. We
might store dozens of GeoIP databases in a single structure for metrics,
and we'll do bulk lookups. If that takes tens of MiB on disk and a
multiple of that in RAM, but therefore finishes within a few minutes, not
hours, I'm okay with that.
So, I produced an [https://gitweb.torproject.org/karsten/metrics-
tasks.git/shortlog/refs/heads/task-6471 initial version] of the multi-
GeoIP database in Java. The unit tests I wrote all succeed, but there
might still be edge cases that break. I'd much appreciate a review of the
design, which is described in the comments. If the design turns out to be
good, we'll want a Python implementation of this Java thing. (Such an
implementation might benefit from the Python code in #2506.)
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/6471#comment:9>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list