[tor-bugs] #5247 [Onionoo]: Include reverse DNS lookup results in details
Tor Bug Tracker & Wiki
torproject-admin at torproject.org
Mon Feb 27 15:12:26 UTC 2012
#5247: Include reverse DNS lookup results in details
-------------------------+--------------------------------------------------
Reporter: karsten | Owner: karsten
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Onionoo | Version:
Keywords: | Parent:
Points: | Actualpoints:
-------------------------+--------------------------------------------------
We should run reverse DNS lookups and include their results in details
documents. What's the best way to run these lookups in Java? Also, do we
have to run them every hour for every relay?
I wrote a simple Java application that looks up host names using the
following code line:
{{{
InetAddress.getByName(address).getHostName()
}}}
The application also measures how long each lookup took. I ran it for the
first 1000 relays in the consensus published on 2012-02-18 at 03:00:00.
Here are some simple statistics:
{{{
Min. 1st Qu. Median Mean 3rd Qu. Max.
114.0 688.8 1032.0 1906.0 1628.0 81120.0
}}}
So, looking up all 2759 relays in the consensus would have taken about 1.5
hours. There's no way for sequentially looking up reverse DNS entries for
all relays in a consensus every hour. We'll need to make some
optimizations before even starting. Questions are:
- Is there a faster way to look up reverse DNS entries than the one used
in this simple Java application?
- Can we group multiple lookups and make a single request for them?
- How often do we need to refresh a reverse DNS lookup result? In theory
we could cache results for an arbitrary time, but would they still be
accurate after 3, 6, 12, 24 hours?
- How many requests can we make in parallel using Java threads? The Java
side is easy and probably doesn't eat too much CPU time, but would we
trigger some mechanism at our ISP when we make 100 requests at a time?
Here are some comments after talking to George and Damian:
- An average lookup time of 1.9 seconds per request isn't that unlikely.
- Using a thread pool with 5 lookup threads should be a fine start.
- Caching results for 12 hours should work fine. It's much more likely
that a relay IP address changes than that the host name changes. We could
also keep some simple statistics how often host names actually change when
looking them up; if the fraction is higher than we'd like it to be, we can
still reduce the caching period to 6 hours or less. We should document in
protocol.html how often host names are looked up.
- Performing multiple lookups per request would be cool, but is probably
not supported by Java libraries.
- I re-ran the analysis above, but this time with the `host` tool instead
of Java. Results are much lower, so there must be something going on in
Java which slows down the lookup. More research needed.
{{{
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0320 0.1800 0.3780 0.4252 0.5420 12.0300
}}}
(This was issue 7 in my GitHub repository.)
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/5247>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list