[or-cvs] r20734: {projects} Add a Python version of the directory archive parsing script (projects/archives/trunk/exonerator)

kloesing at seul.org kloesing at seul.org
Sat Oct 3 15:40:31 UTC 2009


Author: kloesing
Date: 2009-10-03 11:40:31 -0400 (Sat, 03 Oct 2009)
New Revision: 20734

Added:
   projects/archives/trunk/exonerator/exonerator.py
Modified:
   projects/archives/trunk/exonerator/ExoneraTor.java
   projects/archives/trunk/exonerator/HOWTO
Log:
Add a Python version of the directory archive parsing script.


Modified: projects/archives/trunk/exonerator/ExoneraTor.java
===================================================================
--- projects/archives/trunk/exonerator/ExoneraTor.java	2009-10-02 15:57:37 UTC (rev 20733)
+++ projects/archives/trunk/exonerator/ExoneraTor.java	2009-10-03 15:40:31 UTC (rev 20734)
@@ -377,16 +377,16 @@
         if (inTooOldConsensuses && !inTooNewConsensuses)
           System.out.println("\nNote that we found a matching relay in "
               + "consensuses that were published between 5:00 and 3:00 "
-              + "hours before " + timestampStr + ". ");
+              + "hours before " + timestampStr + ".");
         else if (!inTooOldConsensuses && inTooNewConsensuses)
           System.out.println("\nNote that we found a matching relay in "
               + "consensuses that were published up to 2:00 hours after "
-              + timestampStr + ". ");
+              + timestampStr + ".");
         else
           System.out.println("\nNote that we found a matching relay in "
               + "consensuses that were published between 5:00 and 3:00 "
               + "hours before and in consensuses that were published up "
-              + "to 2:00 hours after " + timestampStr + ". ");
+              + "to 2:00 hours after " + timestampStr + ".");
         System.out.println("Make sure that the timestamp you provided is "
             + "in the correct timezone: UTC (or GMT).");
       }

Modified: projects/archives/trunk/exonerator/HOWTO
===================================================================
--- projects/archives/trunk/exonerator/HOWTO	2009-10-02 15:57:37 UTC (rev 20733)
+++ projects/archives/trunk/exonerator/HOWTO	2009-10-03 15:40:31 UTC (rev 20734)
@@ -24,14 +24,51 @@
 prints out all intermediate steps in answering this, so that users can
 confirm the correctness of the result themselves.
 
+This script is available in two versions written in Python and in Java with
+equivalent functionality.
+
 ---------------------------------------------------------------------------
 
-Quick Start:
+Python Quick Start:
 
-In order to run this script, you need to install and download the following
-software and data (please note that all instructions are written for Linux;
-commands for Windows or Mac OS X may vary):
+In order to run the Python version of this script, you need to install and
+download the following software and data (please note that all instructions
+are written for Linux; commands for Windows or Mac OS X may vary):
 
+- Install Python 2.6.2 or higher. (Previous Python versions might work,
+  too, but have not been tested.)
+
+- Copy the consensuses-* and server-descriptors-* files of the relevant
+  time from http://archive.torproject.org/tor-directory-authority-archive/
+  and extract them to a directory in your working directory, e.g.
+  /home/you/exonerator/data/ . Don't rename the extracted directories or
+  any of the contained files, or the script won't find the contained
+  descriptors.
+
+- Run the script, providing it with the parameters it needs:
+
+  python exonerator.py <descriptor archive directory>
+           <IP address in question>
+           <timestamp, in UTC, formatted as YYYY-MM-DD hh:mm:ss>
+           [<target address>[:<target port>]]
+
+  Make sure that the timestamp is provided in UTC, which is similar to GMT,
+  and not in your local timezone! Otherwise, results will very likely be
+  wrong.
+
+  A sample invocation might be:
+
+  $ python exonerator.py data/ 209.17.171.104 2009-08-15 16:05:00 \
+        209.85.129.104:80
+
+---------------------------------------------------------------------------
+
+Java Quick Start:
+
+In order to run the Java version of this script, you need to install and
+download the following software and data (please note that all instructions
+are written for Linux; commands for Windows or Mac OS X may vary):
+
 - Install Java 6 or higher.
 
 - Download the BouncyCastle provider that includes Base 64 decoding from
@@ -80,26 +117,34 @@
 
 - Positive result of echelon1+2 being a relay:
 
+  $ python exonerator.py data/ 209.17.171.104 2009-08-15 16:05:00
   $ java -cp .:bcprov-jdk16-143.jar ExoneraTor data/ 209.17.171.104 \
         2009-08-15 16:05:00
 
 - Positive result of echelon1+2 exiting to google.com on any port
 
+  $ python exonerator.py data/ 209.17.171.104 2009-08-15 16:05:00 \
+        209.85.129.104
   $ java -cp .:bcprov-jdk16-143.jar ExoneraTor data/ 209.17.171.104 \
         2009-08-15 16:05:00 209.85.129.104
 
 - Positive result of echelon1+2 exiting to google.com on port 80
 
+  $ python exonerator.py data/ 209.17.171.104 2009-08-15 16:05:00 \
+        209.85.129.104:80
   $ java -cp .:bcprov-jdk16-143.jar ExoneraTor data/ 209.17.171.104 \
         2009-08-15 16:05:00 209.85.129.104:80
 
 - Negative result of echelon1+2 exiting to google.com, but not on port 25
 
+  $ python exonerator.py data/ 209.17.171.104 2009-08-15 16:05:00 \
+        209.85.129.104:25
   $ java -cp .:bcprov-jdk16-143.jar ExoneraTor data/ 209.17.171.104 \
         2009-08-15 16:05:00 209.85.129.104:25
 
 - Negative result with IP address of echelon1+2 changed in the last octet
 
+  $ python exonerator.py data/ 209.17.171.50 2009-08-15 16:05:00
   $ java -cp .:bcprov-jdk16-143.jar ExoneraTor data/ 209.17.171.50 \
         2009-08-15 16:05:00
 

Added: projects/archives/trunk/exonerator/exonerator.py
===================================================================
--- projects/archives/trunk/exonerator/exonerator.py	                        (rev 0)
+++ projects/archives/trunk/exonerator/exonerator.py	2009-10-03 15:40:31 UTC (rev 20734)
@@ -0,0 +1,369 @@
+#!/usr/bin/env python
+# Copyright 2009 The Tor Project -- see LICENSE for licensing information
+
+import binascii
+import os
+import sys
+import time
+
+# check parameters
+if len(sys.argv) not in (5, 6):
+    print "\nUsage: python exonerator.py <descriptor archive directory> " \
+          "<IP address in question> <timestamp, in UTC, formatted as " \
+          "YYYY-MM-DD hh:mm:ss> [<target address>[:<target port>]]\n"
+    sys.exit()
+archiveDirectory = sys.argv[1]
+if not os.path.isdir(archiveDirectory):
+    print "\nDescriptor archive directory %s does not exist or is not a " \
+          "directory.\n" % os.path.abspath(archiveDirectory)
+    sys.exit()
+archiveDirectory = os.path.dirname(archiveDirectory)
+relayIP = sys.argv[2]
+timestampStr = "%s %s" % (sys.argv[3], sys.argv[4])
+os.environ['TZ'] = 'UTC'
+time.tzset()
+timestamp = time.strptime(timestampStr, "%Y-%m-%d %H:%M:%S")
+# if a target is given, parse address and possibly port part of it
+target = None
+targetIP = None
+targetPort = None
+if len(sys.argv) == 6:
+    target = sys.argv[5]
+    targetParts = target.split(":")
+    targetIP = targetParts[0]
+    if len(targetParts) == 2:
+        targetPort = targetParts[1]
+    targetIPParts = targetIP.split(".")
+DELIMITER = "-----------------------------------------------------------" \
+            "----------------"
+targetHelpStr = ""
+if target:
+    targetHelpStr = " permitting exiting to %s" % target
+print "\nTrying to find out whether %s was running a Tor relay at " \
+      "%s%s...\n\n%s\n" % (relayIP, timestampStr, targetHelpStr, DELIMITER)
+
+# check that we have the required archives
+timestampTooOld = time.gmtime(time.mktime(timestamp) - 300 * 60)
+timestampFrom = time.gmtime(time.mktime(timestamp) - 180 * 60)
+timestampTooNew = time.gmtime(time.mktime(timestamp) + 120 * 60)
+timestampTooOldStr = time.strftime("%Y-%m-%d %H:%M:%S", timestampTooOld)
+timestampFromStr = time.strftime("%Y-%m-%d %H:%M:%S", timestampFrom)
+timestampTooNewStr = time.strftime("%Y-%m-%d %H:%M:%S", timestampTooNew)
+print "\nChecking that relevant archives between %s and %s are " \
+      "available..." % (timestampTooOldStr, timestampTooNewStr)
+
+requiredDirs = set()
+requiredDirs.add(time.strftime("consensuses-%Y-%m", timestampTooOld))
+requiredDirs.add(time.strftime("consensuses-%Y-%m", timestampTooNew))
+if target is not None:
+    requiredDirs.add(time.strftime("server-descriptors-%Y-%m",
+                                      timestampTooOld))
+    requiredDirs.add(time.strftime("server-descriptors-%Y-%m",
+                                      timestampTooNew))
+
+consensusDirs = list()
+descriptorsDirs = list()
+directoriesLeftToParse = list()
+directoriesLeftToParse.append(archiveDirectory)
+
+while len(directoriesLeftToParse) > 0:
+    directoryOrFile = directoriesLeftToParse.pop()
+    basename = os.path.basename(directoryOrFile)
+    if basename.startswith("consensuses-"):
+        if basename in requiredDirs:
+            requiredDirs.remove(basename)
+            consensusDirs.append(directoryOrFile)
+    elif basename.startswith("server-descriptors-"):
+        if basename in requiredDirs:
+            requiredDirs.remove(basename)
+            descriptorsDirs.append(directoryOrFile)
+    else:
+        for filename in os.listdir(directoryOrFile):
+            entry = "%s/%s" % (directoryOrFile, filename)
+            if os.path.isdir(entry):
+                directoriesLeftToParse.append(entry)
+
+consensusDirs.sort()
+for file in consensusDirs:
+    print "  %s" % file
+descriptorsDirs.sort()
+for file in descriptorsDirs:
+    print "  %s" % file
+
+if len(requiredDirs) > 0:
+    print "\nWe are missing consensuses and/or server descriptors. " \
+          "Please download these archives and extract them to your data " \
+          "directory. Be sure NOT to rename the extracted directories " \
+          "or the contained files."
+    missingFiles = list()
+    for file in sorted(requiredDirs):
+        print "  %s.tar.bz2" % file
+    sys.exit()
+
+# look for consensus files
+print "\nLooking for relevant consensuses between %s and %s..." % \
+      (timestampFromStr, timestampStr)
+tooOldConsensuses = set()
+relevantConsensuses = set()
+tooNewConsensuses = set()
+directoriesLeftToParse = list()
+for file in consensusDirs:
+    directoriesLeftToParse.append(file)
+while len(directoriesLeftToParse) > 0:
+    directoryOrFile = directoriesLeftToParse.pop()
+    if os.path.isdir(directoryOrFile):
+        for filename in os.listdir(directoryOrFile):
+            entry = "%s/%s" % (directoryOrFile, filename)
+            directoriesLeftToParse.append(entry)
+    else:
+        basename = os.path.basename(directoryOrFile)
+        if (basename.endswith("consensus")):
+            consensusTime = time.strptime(basename[0:19],
+                                          "%Y-%m-%d-%H:%M:%S")
+            if consensusTime >= timestampTooOld and \
+               consensusTime < timestampFrom:
+                tooOldConsensuses.add(directoryOrFile)
+            elif consensusTime >= timestampFrom and \
+                 consensusTime <= timestamp:
+                relevantConsensuses.add(directoryOrFile)
+            elif consensusTime > timestamp and \
+                 consensusTime <= timestampTooNew:
+                tooNewConsensuses.add(directoryOrFile)
+allConsensuses = set()
+for file in tooOldConsensuses:
+    allConsensuses.add(file)
+for file in relevantConsensuses:
+    allConsensuses.add(file)
+for file in tooNewConsensuses:
+    allConsensuses.add(file)
+if len(allConsensuses) == 0:
+    print "  None found!\n\n%s\n\nResult is INDECISIVE!\n\nWe cannot " \
+          "make any statement about IP address %s being a relay at %s " \
+          "or not! We did not find any relevant consensuses preceding " \
+          "the given time. This either means that you did not download " \
+          "and extract the consensus archives preceding the hours " \
+          "before the given time, or (in rare cases) that the directory " \
+          "archives are missing the hours before the timestamp. Please " \
+          "check that your directory archives contain consensus files " \
+          "of the interval 5:00 hours before and 2:00 hours after the " \
+          "time you are looking for.\n" % \
+          (DELIMITER, relayIP, timestampStr)
+    sys.exit()
+for file in sorted(relevantConsensuses):
+    print "  %s" % file
+
+# parse consensuses to find descriptors belonging to the IP address
+print "\nLooking for descriptor identifiers referenced in \"r \" lines " \
+      "in these consensuses containing IP address %s..." % relayIP
+positiveConsensusesNoTarget = set()
+addressesInSameNetwork = set()
+relevantDescriptors = dict()
+for consensus in allConsensuses:
+    if consensus in relevantConsensuses:
+        print "  %s" % consensus
+    file = open(consensus, "r")
+    line = file.readline()
+    while line:
+        if line.startswith("r "):
+            address = line.split(" ")[6]
+            if address == relayIP:
+                hexDesc = binascii.b2a_hex(binascii.a2b_base64(
+                                           line.split(" ")[3] + "=="))
+                if hexDesc not in relevantDescriptors.keys():
+                    relevantDescriptors[hexDesc] = set()
+                relevantDescriptors[hexDesc].add(consensus)
+                positiveConsensusesNoTarget.add(consensus)
+                if consensus in relevantConsensuses:
+                    print "    \"%s\" references descriptor %s" % \
+                          (line.rstrip(), hexDesc)
+            elif relayIP.startswith(address[0:address.rfind(".")]):
+                addressesInSameNetwork.add(address)
+        line = file.readline()
+    file.close()
+if len(relevantDescriptors) == 0:
+    print "  None found!\n\n%s\n\nResult is NEGATIVE with moderate " \
+          "certainty!\n\nWe did not find IP address %s in any of the " \
+          "consensuses that were published between %s and %s.\n\nA " \
+          "possible reason for false negatives is that the relay is " \
+          "using a different IP address when generating a descriptor " \
+          "than for exiting to the Internet. We hope to provide better " \
+          "checks for this case in the future." % \
+          (DELIMITER, relayIP, timestampTooOldStr, timestampTooNewStr)
+    if len(addressesInSameNetwork) > 0:
+        print "\nThe following other IP addresses of Tor relays were " \
+              "found in the mentioned consensus files that are in the " \
+              "same /24 network and that could be related to IP address " \
+              "%s:" % relayIP
+        for addr in addressesInSameNetwork:
+            print "  %s" % addr
+    print ""
+    sys.exit()
+
+# parse router descriptors to check exit policies
+positiveConsensuses = set()
+missingDescriptors = set()
+if target is not None:
+    print "\nChecking if referenced descriptors permit exiting to " \
+          "%s..." % target
+    descriptors = relevantDescriptors.keys()
+    for desc in descriptors:
+        missingDescriptors.add(desc)
+    directoriesLeftToParse = list()
+    for descriptorsDir in descriptorsDirs:
+        directoriesLeftToParse.append(descriptorsDir)
+    while len (directoriesLeftToParse) > 0:
+        directoryOrFile = directoriesLeftToParse.pop()
+        if os.path.isdir(directoryOrFile):
+            for filename in os.listdir(directoryOrFile):
+                entry = "%s/%s" % (directoryOrFile, filename)
+                directoriesLeftToParse.append(entry)
+        else:
+            basename = os.path.basename(directoryOrFile)
+            for descriptor in descriptors:
+                if basename == descriptor:
+                    missingDescriptors.remove(descriptor)
+                    file = open(directoryOrFile, "r")
+                    line = file.readline()
+                    while line:
+                        if line.startswith("reject ") or \
+                           line.startswith("accept "):
+                            ruleAccept = line.split()[0] == "accept"
+                            ruleAddress = line.split()[1].split(":")[0]
+                            if ruleAddress != "*":
+                                if '/' not in ruleAddress and \
+                                   ruleAddress != targetIP:
+                                    # IP address does not match
+                                    line = file.readline()
+                                    continue
+                                ruleIPParts = ruleAddress.split("/")[0]. \
+                                              split(".")
+                                ruleNetwork = int(ruleAddress. \
+                                              split("/")[1])
+                                for i in range(0, 4):
+                                    if ruleNetwork == 0:
+                                        break
+                                    elif ruleNetwork >= 8:
+                                        if ruleIPParts[i] == \
+                                           targetIPParts[i]:
+                                            ruleNetwork -= 8
+                                        else:
+                                            break
+                                    else:
+                                        mask = 255 ^ 255 >> ruleNetwork
+                                        if int(ruleIPParts[i]) & mask == \
+                                           int(targetIPParts[i]) & mask:
+                                            ruleNetwork = 0
+                                        break
+                                if ruleNetwork > 0:
+                                    # IP address does not match
+                                    line = file.readline()
+                                    continue
+                            rulePort = line.split()[1].split(":")[1]
+                            if targetPort is None and not ruleAccept and \
+                               rulePort != "*":
+                                # with no port given, we only consider
+                                # reject :* rules as matching
+                                line = file.readline()
+                                continue
+                            if targetPort and rulePort != "*" and \
+                               targetPort != rulePort:
+                                # ports do not match
+                                line = file.readline()
+                                continue
+                            relevantMatch = False
+                            for f in relevantDescriptors.get(descriptor):
+                                if f in relevantConsensuses:
+                                    relevantMatch = True
+                            if relevantMatch:
+                                if ruleAccept:
+                                    print "  %s permits exiting to %s " \
+                                          "according to rule \"%s\"" % \
+                                          (directoryOrFile, target,
+                                          line.rstrip())
+                                else:
+                                    print "  %s does not permit exiting " \
+                                          "to %s according to rule " \
+                                          "\"%s\"" % (directoryOrFile,
+                                          target, line.rstrip())
+                            if ruleAccept:
+                                for consensus in \
+                                    relevantDescriptors.get(descriptor):
+                                    positiveConsensuses.add(consensus)
+                            break;
+                        line = file.readline()
+                    file.close()
+
+# print out result
+matches = None
+if target:
+    matches = positiveConsensuses
+else:
+    matches = positiveConsensusesNoTarget
+lastConsensus = sorted(relevantConsensuses)[len(relevantConsensuses) - 1]
+if lastConsensus in matches:
+    print "\n%s\n\nResult is POSITIVE with high certainty!\n\nWe found " \
+          "one or more relays on IP address %s%s in the most recent " \
+          "consensus preceding %s that clients were likely to know.\n" % \
+          (DELIMITER, relayIP, targetHelpStr, timestampStr)
+    sys.exit()
+resultIndecisive = target and len(missingDescriptors) > 0
+if resultIndecisive:
+    print "\n%s\n\nResult is INDECISIVE!\n\nAt least one referenced " \
+          "descriptor could not be found. This is a rare case, but one " \
+          "that (apparently) happens. We cannot make any good statement " \
+          "about exit relays without these descriptors. The following " \
+          "descriptors are missing:" % DELIMITER
+    for desc in missingDescriptors:
+        print "  %s" % desc
+inOtherRelevantConsensus = False
+inTooOldConsensuses = False
+inTooNewConsensuses = False
+for f in matches:
+    if f in relevantConsensuses:
+        inOtherRelevantConsensus = True
+    elif f in tooOldConsensuses:
+        inTooOldConsensuses = True
+    elif f in tooNewConsensuses:
+        inTooNewConsensuses = True
+if inOtherRelevantConsensus:
+    if not resultIndecisive:
+        print "\n%s\n\nResult is POSITIVE with moderate certainty!" % \
+              DELIMITER
+    print "\nWe found one or more relays on IP address %s%s, but not in " \
+          "the consensus immediately preceding %s. A possible reason " \
+          "for the relay being missing in the last consensus preceding " \
+          "the given time might be that some of the directory " \
+          "authorities had difficulties connecting to the relay. " \
+          "However, clients might still have used the relay." % (relayIP,
+          targetHelpStr, timestampStr)
+else:
+    if not resultIndecisive:
+        print "\n%s\n\nResult is NEGATIVE with high certainty!" % \
+              DELIMITER
+    print "\nWe did not find any relay on IP address %s%s in the " \
+          "consensuses 3:00 hours preceding %s." % (relayIP, targetHelpStr,
+          timestampStr)
+    if inTooOldConsensuses or inTooNewConsensuses:
+        if inTooOldConsensuses and not inTooNewConsensuses:
+            print "\nNote that we found a matching relay in consensuses " \
+                  "that were published between 5:00 and 3:00 hours " \
+                  "before %s." % timestampStr
+        elif not inTooOldConsensuses and inTooNewConsensuses:
+            print "\nNote that we found a matching relay in consensuses " \
+                  "that were published up to 2:00 hours after %s." % \
+                  timestampStr
+        else:
+            print "\nNote that we found a matching relay in consensuses " \
+                  "that were published between 5:00 and 3:00 hours " \
+                  "before and in consensuses that were published up to " \
+                  "2:00 hours after %s." % timestampStr
+        print "Make sure that the timestamp you provided is in the " \
+              "correct timezone: UTC (or GMT)."
+if target:
+    if len(positiveConsensuses) == 0 and \
+       len(positiveConsensusesNoTarget) > 0:
+        print "\nNote that although the found relay(s) did not permit " \
+              "exiting to %s there have been one or more relays running " \
+              "at the given time." % target
+print ""
+


Property changes on: projects/archives/trunk/exonerator/exonerator.py
___________________________________________________________________
Added: svn:executable
   + *



More information about the tor-commits mailing list