[tor-bugs] #6471 [Metrics Utilities]: Design file format and Python library for multiple GeoIP or AS databases
Tor Bug Tracker & Wiki
torproject-admin at torproject.org
Thu Aug 2 06:28:27 UTC 2012
#6471: Design file format and Python library for multiple GeoIP or AS databases
-------------------------------+--------------------------------------------
Reporter: karsten | Owner:
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Metrics Utilities | Version:
Keywords: | Parent:
Points: | Actualpoints:
-------------------------------+--------------------------------------------
Comment(by gsathya):
Pyonioonoo had me distracted for a while, getting back to metrics stuff..
Possible first step would be to figure out if there are any additional
info that we don't need/use in maxminds db.
A naive solution then would be to -
Step 0) Remove unnecessary data
Step 1) Diff the old csv with the new csv
Step 2.1) Add a human readable(?) line to the old csv - explaining the
date of change, no of lines changed and possibly other details that might
become obvious once we actually try to diff
Step 2.2) Modify the diff to make more parseable since we know that we are
only diff-ing csv's - i bet we can optimize this a bit
Step 3) Append the modified diff to the old csv
Step 4) Write a library that can parse added human readable line and the
modified diff
Another solution would be to go all out and write our own spec and a
parser that converts every newly generated GeoIP db into something that
conforms with our spec. (And write a library to parse such a file)
The second approach would be a lot more useful in the long run but a lot
more time consuming to write. If we pick either approaches(or an
alternative one) I'd be happy to write the python code for it!
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/6471#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list