[metrics-bugs] #26585 [Metrics/Onionoo]: improve AS number and name coverage (switch maxmind to RIPE Stat)
Tor Bug Tracker & Wiki
blackhole at torproject.org
Sun Jul 1 14:24:17 UTC 2018
#26585: improve AS number and name coverage (switch maxmind to RIPE Stat)
-----------------------------+------------------------------
Reporter: nusenu | Owner: metrics-team
Type: enhancement | Status: new
Priority: Medium | Milestone:
Component: Metrics/Onionoo | Version:
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: | Points:
Reviewer: | Sponsor:
-----------------------------+------------------------------
Comment (by irl):
For country codes, there are 321 relays where there are disagreement and
7837 in agreement (κ = 0.959 excluding relays for which MaxMind had no
country code). There were no relays for which RIPEstat did not return a
country code, but there were 21 relays for which MaxMind was missing a
country code. This leaves 300 relays for which both MaxMind and RIPEstat
had a country code but there was disagreement.
RIPEstat does return 7 relays with the country code "eu" and 1 relay with
the country code "ap" for Europe and Asia/Pacific respectively.
[[https://dev.maxmind.com/geoip/legacy/codes/iso3166/|MaxMind have
documentation]] indicating that they also use these codes, but did not
return any results with these codes. In all of these cases, MaxMind did
not have a country code.
Without ground truth to compare to, it is not possible to say whether
MaxMind or RIPEstat are correct in the cases where there were
disagreement. It is also possible that MaxMind and RIPEstat agree on a
country code that is incorrect.
For AS numbers, there are 269 relays where there are disagreement and 7889
in agreement (κ = 0.979 excluding relays for which either MaxMind or
RIPEstat had no AS number). There were 101 relays for which MaxMind did
not return an AS number and 2 relays for which RIPEstat did not return an
AS number. Both of the relays for which RIPEstat did not return an AS
number were in the 1.0.0.0/8 BGP prefix which has the "cn" country code
for RIPEstat, but the "au" country code from MaxMind. MaxMind placed these
relays in AS 4804.
It is not clear to me what our threshold on agreement should be. As the
MaxMind database is distributed to users and can be used, for example, to
disable/prefer the use of exit relays in specific countries, it may be
dangerous to users if they get mixed information about the country code
assigned to relays. It may be equally dangerous to incorrectly assign
country codes, but without ground truth to compare to it is not possible
to say whether a switch would improve that situation or not.
We should conduct an analysis of the different databases and feeds
available to us, to determine which best fits our requirements. As for
querying RIPEstat, I have [[https://github.com/britram/canid|a tool]]
which I have used in the above analysis and would make it easier to
integrate this into Onionoo if we were to choose to integrate data from
RIPEstat.
I don't believe we should consider outright replacing MaxMind with
RIPEstat for the reason that we distribute this to end clients and we need
a database that we can do this with, but I can see that having additional
information when MaxMind does not have any information, and also to add
the BGP prefix information (finer grained topology information than just
AS) would be valuable to some users.
What do you think about the addition of two new fields: 'country_source'
and 'as_source' to indicate the source of country/as information? We could
then supplement the MaxMind data with data from RIPEstat where MaxMind
does not have the information while being able to make it clear to users
where the information has come from if that is important to them.
We could also additionally add a 'bgp_prefix' field with prefix data from
RIPEstat.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26585#comment:1>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the metrics-bugs
mailing list