[ooni-dev] How should we normalise DNS test results?
    Arturo Filastò 
    art at torproject.org
       
    Mon Jan 18 14:05:34 UTC 2016
    
    
  
Hi Tyler,
Thanks for your email!
> On Jan 14, 2016, at 04:47, Tyler Fisher <apt.get.apps at gmail.com> wrote:
> 
> Signed PGP part
> Hello,
> 
> I am working on normalisation for all of the DNS based tests right now
> (i.e. dns_consistency, and dns_injection) and was wondering if any of
> you had any suggestions with regards to how we should be normalising
> these results.
> 
> So far, this is what I have come up with looks like this:
> 
> {'data_format_version': None,
> 'input': 'www.ignored.ch',
> 'options': ['-f', 'citizenlab-urls-global.txt', '-T',
> 'dns-server-ch.txt'],
> 'probe_asn': 'AS41715',
> 'probe_cc': 'CH',
> 'probe_ip': '127.0.0.1',
> 'report_filename':
> 's3://ooni-private/reports-raw/yaml/2016-01-01/dns_consistency-2015-12-3
> 1T220031Z-AS41715-probe.yamloo',
> 'report_id':
> 'bWEWmX6oEftSSJq9yEF5oH0VPOU5VZJooX06gQENo136sSoj9MzlTBk7EjhfH1Td',
> 'software_name': 'ooniprobe',
> 'software_version': '1.3.2',
> 'test_helpers': {'backend': '213.138.109.232:57004'},
> 'test_keys': {'annotations': None,
>                'backend_version': '1.1.4',
>                'control_resolver': '213.138.109.232:57004',
>                'errors': {'130.60.128.3': 'dns_lookup_error',
>                           '130.60.128.5': 'dns_lookup_error',
>                           '194.158.230.53': False,
>                           '194.230.1.5': False,
>                           '82.195.224.5': 'no_answer'},
>                'failed': {'130.60.128.3',
>                           '130.60.128.5',
>                           '82.195.224.5'},
>                'input_hashes':
> ['3f786850e387550fdab836ed7e6dc881de23001b'],
>                'queries': [{failure': None,
>                             'hostname': 'www.ignored.ch',
>                             'query_type': 'A',
>                             'resolver_hostname': '213.138.109.232',
>                             'resolver_port': 57004},
>                            {'failure': None,
>                             'hostname': 'www.ignored.ch',
>                             'query_type': 'A',
>                             'resolver_hostname': '212.147.10.10',
>                             'resolver_port': 53}],
>                'successful': {'194.158.230.53',
>                               '194.230.1.5',
>                               '195.186.1.111',
>                               '81.221.252.10'}},
> 'test_name': 'dns_consistency',
> 'test_runtime': 32.54842686653137,
> 'test_start_time': 1451605073.0,
> 'test_version': '0.6'}
> 
> After looking into the source code for the DNS consistency test, and
> the dnst template I was able to determine the subject of the DNS
> query, however, I am not sure how to handle the addr. section which
> changes depending on whether the associated DNS query has a type of
> A/SOA/NS (see:
> https://github.com/TheTorProject/ooni-probe/blob/master/ooni/templates/d
> nst.py#L153).
> 
> If you have any suggestions with regards to how to normalise dnst
> results, I've linked to the raw, and normalised reports below.
> 
> Gist: https://gist.github.com/TylerJFisher/7372f9c31c54b5207d2a
> Normalisation routine:
> https://gist.github.com/TylerJFisher/7372f9c31c54b5207d2a#file-normalise
> -py
I think how you have normalised the dns_consistency test is much better and I think that we should eventually integrate this data format directly inside of the ooni-probe tests themselves so that we don’t have to do any further normalised, that are error prone, on future reports.
I am a bit torn as to how to resolve the addrs key issue, because on one side I like the idea of not having to dig too much into the answers array to extract the stuff I am interested in, but on the other hand it’s probably best to have things be as consistent as possible.
I think the best option is probably to just merge the “addrs” and “answers” into one list and make the items of the list change depending on the type of query (there is no cleaner way around this since the RDATA field in DNS is made this way).
I would say every item in the answers list has in the “ttl” key, the rest is specific depending on the type of query like so:
* A = “answers”: [{“ipv4”: “xxx.xxx.xxx.xxx”}, {“ipv4”: “xxx.xxx.xxx.xxx”}]
* PTR, NS = “ answers”: [{“hostname”: “xxx.yyy”}, {“hostname”: “xxx.yyy”}]
* MX = “ answers”: [{“preference”: int, “hostname”: “xxx.yyy”}, {“preference”: int, “hostname”: “xxx.yyy”}]
* SOA = “ answers”: [{“serial_number”: int, “refresh_interval”: int, “retry_interval”: int, “expiration_limit”: int, “minimum_ttl”: int, “hostname”: “xxx.yyy”, “responsible_name”: “xxx.yyy.zzz”}, …]
Note: For SOA queries we currently don’t collect all the above mentioned data in ooni-probe, but since we are going to change the data format anyways we may as well change it in a way that is future proof.
Do you think this makes sense?
~ Arturo
    
    
More information about the ooni-dev
mailing list