[tor-commits] [onionoo/master] squash! Implements task-20596: use metrics-base and reduced build.xml,
karsten at torproject.org
karsten at torproject.org
Thu Jan 5 14:28:01 UTC 2017
commit 46d5c6489db728676b1993be3515763d1459e046
Author: iwakeh <iwakeh at torproject.org>
Date: Tue Jan 3 17:35:28 2017 +0100
squash! Implements task-20596: use metrics-base and reduced build.xml,
added executable bootstrap script. Removed obsolete DESIGN document and
metrics_checks.xml.
---
DESIGN | 157 --------------------
build.xml | 2 +-
src/main/resources/bootstrap-development.sh | 0
src/test/resources/metrics_checks.xml | 219 ----------------------------
4 files changed, 1 insertion(+), 377 deletions(-)
diff --git a/DESIGN b/DESIGN
deleted file mode 100644
index 6afbdb3..0000000
--- a/DESIGN
+++ /dev/null
@@ -1,157 +0,0 @@
-Onionoo design document
-=======================
-
-This short document describes Onionoo's design in a mostly informal and
-language-independent way. The goal is to be able to discuss design
-decisions with non-Java programmers and to provide a blueprint for porting
-Onionoo to other programming languages. This document cannot describe all
-the details, but it can provide a rough overview.
-
-There are two main building blocks of Onionoo that are described here:
-
- 1) an hourly cronjob processing newly published Tor descriptors and
-
- 2) a web service component answering client requests.
-
-The interface between the two building blocks is a directory in the local
-file system that can be read and written by component 1 and can be read by
-component 2. In theory, the two components can be implemented in two
-entirely different programming languages. In a possible port from Java to
-another programming language, the two components can easily be ported
-subsequently.
-
-The purpose of the hourly batch processor is to read updated Tor
-descriptors from the metrics service and transform them to be read by the
-web service component. Answering a client request in component 2 of
-Onionoo needs to be highly efficient which is why any data aggregation
-needs to happen beforehand. Parsing descriptors on-the-fly is not an
-option.
-
-The hourly batch processor is run in a cron job at :15 every hour that
-usually takes up to five minutes and that contains the following substeps:
-
- 1.1) Rsync new Tor descriptors from metrics.
-
- 1.2) Read previously stored status data about relays and bridges that
- have been running in the last seven days to memory. These data
- include for each relay or bridge: nickname, fingerprint, primary
- OR address and port, additional OR addresses and ports, exit
- addresses, network status publication time, directory port, relay
- flags, consensus weight, country code, host name as obtained by
- reverse domain name lookup, and timestamp of last reverse domain
- name lookup.
-
- 1.3) Import any new relay network status consensuses that have been
- published since the last run.
-
- 1.4) Set the running bit for all relays that are contained in the last
- known relay network status consensus.
-
- 1.5) Look up all relay IP addresses in a local GeoIP database and in a
- local AS number database. Extract country codes and names, city
- names, geo coordinates, AS name and number, etc.
-
- 1.6) Import any new bridge network statuses that have been published
- since the last run.
-
- 1.7) Start reverse domain name lookups for all relay IP addresses. Run
- in background, only refresh lookups for previously looked up IP
- address every 12 hours, run up to five lookups in parallel, and
- set timeouts for single requests and for the general lookup
- process. In theory, this step could happen a few steps before,
- but not before step 1.3.
-
- 1.8) Import any new relay server descriptors that have been published
- since the last run.
-
- 1.9) Import any new exit lists that have been published since the last
- run.
-
- 1.10) Import any new bridge server descriptors that have been published
- since the last run.
-
- 1.11) Import any new bridge pool assignments that have been published
- since the last run.
-
- 1.12) Make sure that reverse domain name lookups are finished or the
- timeout for running lookups has expired. This step cannot happen
- at any time later than step 1.13 and shouldn't happen long before.
-
- 1.13) Rewrite all details files that have changed. Details files
- combine information from all previously imported descriptory
- types, database lookups, and performed reverse domain name
- lookups. The web service component needs to be able to retrieve a
- details file for a given relay or bridge without grabbing
- information from different data sources. It's best to write the
- details file part for a give relay or bridge to a single file in
- the target JSON format, saved under the relay's or bridge's
- fingerprint. If a database is used, the raw string should be
- saved for faster processing.
-
- 1.14) Import relays' and bridges' bandwidth histories from extra-info
- descriptors that have been published since the last run. There
- must be internally stored bandwidth histories for each relay and
- bridge, regardless of whether they have been running in the last
- seven days. The original bandwidth histories, which are available
- on 15-minute detail, can be aggregated to longer time periods the
- farther the interval lies in the past. The interal bandwidth
- histories are different from the bandwidth files described in 1.15
- which are written to be given out to clients.
-
- 1.15) Rewrite bandwidth files that have changed. Bandwidth files
- aggregate bandwidth history information on varying levels of
- detail, depending on how far observations lie in the past. It's
- inevitable to write JSON-formatted bandwidth files for all relays
- and bridges in the hourly cronjob. Any attempts to process years
- of bandwidth data while answering a web request can only fail.
- The previously aggregated bandwidth files are stored under the
- relay's or bridge's fingerprint for quick lookup.
-
- 1.16) Update the summary file listing all relays and bridges that have
- been running in the last seven days which was previously read in
- step 1.2. This is the last step in the hourly process. The web
- service component checks the modification time of this file to
- decide whether it needs to reload its view on the network. If
- this step was not the last step, the web service component might
- list relays or bridges for which there are no details or bandwidth
- files available yet. (With the approach taken here, it's
- conveivable that a bandwidth file of a relay or bridge that hasn't
- been running for a week has been deleted before step 1.16. This
- case has been found acceptable, because it's highly unlikely. If
- a database would have been used, steps 1.2 to 1.16 would have
- happened in a single database transaction.)
-
-The web service component has the purpose of answering client requests.
-It uses previously prepared data from the hourly cronjob to respond to
-requests very quickly.
-
-During initialization, or whenever the hourly cronjob has finished, the
-web service component does the following substeps:
-
- 2.1) Read the summary file that was produced by the hourly cronjob in
- step 1.16.
-
- 2.2) Keep the list of relays and bridges in memory, including all
- information that is used for filtering or sorting results.
-
- 2.3) Prepare summary lines for all relays and bridges. The summary
- resource is a JSON file with a single line per relay or bridge.
- This line contains only very few fields as compared to details
- files that a client might use for further filtering results.
-
-When responding to a request, the web service component does the following
-steps:
-
- 2.4) Parse request and its parameters.
-
- 2.5) Possibly filter relays and bridges.
-
- 2.6) Possibly re-order and limit results.
-
- 2.7) Write response or error code.
-
-Again, (and this can hardly be overstated!) steps 2.4 to 2.7 need to
-happen *extremely* fast. Any steps that go beyond file system reads or
-simple database lookups need to happen either in the hourly cronjob (1.1
-to 1.16) or in the web service component initialization (2.1 to 2.3).
-
diff --git a/build.xml b/build.xml
index 5f1d798..26620d8 100644
--- a/build.xml
+++ b/build.xml
@@ -12,7 +12,7 @@
<property name="release.version"
value="${onionoo.protocol.version}-1.0.1-dev"/>
<property name="descriptorversion" value="1.5.0"/>
- <property name="jetty.version" value="" />
+ <property name="jetty.version" value="-8.1.16.v20140903" />
<property name="warfile"
value="onionoo-${release.version}.war"/>
diff --git a/src/main/resources/bootstrap-development.sh b/src/main/resources/bootstrap-development.sh
old mode 100644
new mode 100755
diff --git a/src/test/resources/metrics_checks.xml b/src/test/resources/metrics_checks.xml
deleted file mode 100644
index 6ba415a..0000000
--- a/src/test/resources/metrics_checks.xml
+++ /dev/null
@@ -1,219 +0,0 @@
-<?xml version="1.0"?>
-<!DOCTYPE module PUBLIC
- "-//Puppy Crawl//DTD Check Configuration 1.3//EN"
- "http://www.puppycrawl.com/dtds/configuration_1_3.dtd">
-
-<!--
- Checkstyle configuration that checks the Google coding conventions from Google Java Style
- that can be found at https://google.github.io/styleguide/javaguide.html with the following
- modifications:
-
- - Replaced com.google with org.torproject in import statement ordering
- [CustomImportOrder].
-
- - Relaxed requirement that catch parameters must be at least two
- characters long [CatchParameterName].
-
- Checkstyle is very configurable. Be sure to read the documentation at
- http://checkstyle.sf.net (or in your downloaded distribution).
-
- To completely disable a check, just comment it out or delete it from the file.
-
- Authors: Max Vetrenko, Ruslan Diachenko, Roman Ivanov.
- -->
-
-<module name = "Checker">
- <property name="charset" value="UTF-8"/>
-
- <property name="severity" value="warning"/>
-
- <property name="fileExtensions" value="java, properties, xml"/>
- <!-- Checks for whitespace -->
- <!-- See http://checkstyle.sf.net/config_whitespace.html -->
- <module name="FileTabCharacter">
- <property name="eachLine" value="true"/>
- </module>
-
- <module name="SuppressWarningsFilter" />
- <module name="TreeWalker">
- <module name="OuterTypeFilename"/>
- <module name="IllegalTokenText">
- <property name="tokens" value="STRING_LITERAL, CHAR_LITERAL"/>
- <property name="format" value="\\u00(08|09|0(a|A)|0(c|C)|0(d|D)|22|27|5(C|c))|\\(0(10|11|12|14|15|42|47)|134)"/>
- <property name="message" value="Avoid using corresponding octal or Unicode escape."/>
- </module>
- <module name="AvoidEscapedUnicodeCharacters">
- <property name="allowEscapesForControlCharacters" value="true"/>
- <property name="allowByTailComment" value="true"/>
- <property name="allowNonPrintableEscapes" value="true"/>
- </module>
- <module name="LineLength">
- <property name="max" value="80"/>
- <property name="ignorePattern" value="^package.*|^import.*|a href|href|http://|https://|ftp://"/>
- </module>
- <module name="AvoidStarImport"/>
- <module name="OneTopLevelClass"/>
- <module name="NoLineWrap"/>
- <module name="EmptyBlock">
- <property name="option" value="TEXT"/>
- <property name="tokens" value="LITERAL_TRY, LITERAL_FINALLY, LITERAL_IF, LITERAL_ELSE, LITERAL_SWITCH"/>
- </module>
- <module name="NeedBraces"/>
- <module name="LeftCurly">
- <property name="maxLineLength" value="100"/>
- </module>
- <module name="RightCurly"/>
- <module name="RightCurly">
- <property name="option" value="alone"/>
- <property name="tokens" value="CLASS_DEF, METHOD_DEF, CTOR_DEF, LITERAL_FOR, LITERAL_WHILE, LITERAL_DO, STATIC_INIT, INSTANCE_INIT"/>
- </module>
- <module name="WhitespaceAround">
- <property name="allowEmptyConstructors" value="true"/>
- <property name="allowEmptyMethods" value="true"/>
- <property name="allowEmptyTypes" value="true"/>
- <property name="allowEmptyLoops" value="true"/>
- <message key="ws.notFollowed"
- value="WhitespaceAround: ''{0}'' is not followed by whitespace. Empty blocks may only be represented as '{}' when not part of a multi-block statement (4.1.3)"/>
- <message key="ws.notPreceded"
- value="WhitespaceAround: ''{0}'' is not preceded with whitespace."/>
- </module>
- <module name="OneStatementPerLine"/>
- <module name="MultipleVariableDeclarations"/>
- <module name="ArrayTypeStyle"/>
- <module name="MissingSwitchDefault"/>
- <module name="FallThrough"/>
- <module name="UpperEll"/>
- <module name="ModifierOrder"/>
- <module name="EmptyLineSeparator">
- <property name="allowNoEmptyLineBetweenFields" value="true"/>
- </module>
- <module name="SeparatorWrap">
- <property name="tokens" value="DOT"/>
- <property name="option" value="nl"/>
- </module>
- <module name="SeparatorWrap">
- <property name="tokens" value="COMMA"/>
- <property name="option" value="EOL"/>
- </module>
- <module name="PackageName">
- <property name="format" value="^[a-z]+(\.[a-z][a-z0-9]*)*$"/>
- <message key="name.invalidPattern"
- value="Package name ''{0}'' must match pattern ''{1}''."/>
- </module>
- <module name="TypeName">
- <message key="name.invalidPattern"
- value="Type name ''{0}'' must match pattern ''{1}''."/>
- </module>
- <module name="MemberName">
- <property name="format" value="^[a-z][a-z0-9][a-zA-Z0-9]*$"/>
- <message key="name.invalidPattern"
- value="Member name ''{0}'' must match pattern ''{1}''."/>
- </module>
- <module name="ParameterName">
- <property name="format" value="^[a-z][a-z0-9][a-zA-Z0-9]*$"/>
- <message key="name.invalidPattern"
- value="Parameter name ''{0}'' must match pattern ''{1}''."/>
- </module>
- <module name="CatchParameterName">
- <property name="format" value="^[a-z][a-zA-Z0-9]*$"/>
- <message key="name.invalidPattern"
- value="Catch parameter name ''{0}'' must match pattern ''{1}''."/>
- </module>
- <module name="LocalVariableName">
- <property name="tokens" value="VARIABLE_DEF"/>
- <property name="format" value="^[a-z][a-z0-9][a-zA-Z0-9]*$"/>
- <property name="allowOneCharVarInForLoop" value="true"/>
- <message key="name.invalidPattern"
- value="Local variable name ''{0}'' must match pattern ''{1}''."/>
- </module>
- <module name="ClassTypeParameterName">
- <property name="format" value="(^[A-Z][0-9]?)$|([A-Z][a-zA-Z0-9]*[T]$)"/>
- <message key="name.invalidPattern"
- value="Class type name ''{0}'' must match pattern ''{1}''."/>
- </module>
- <module name="MethodTypeParameterName">
- <property name="format" value="(^[A-Z][0-9]?)$|([A-Z][a-zA-Z0-9]*[T]$)"/>
- <message key="name.invalidPattern"
- value="Method type name ''{0}'' must match pattern ''{1}''."/>
- </module>
- <module name="InterfaceTypeParameterName">
- <property name="format" value="(^[A-Z][0-9]?)$|([A-Z][a-zA-Z0-9]*[T]$)"/>
- <message key="name.invalidPattern"
- value="Interface type name ''{0}'' must match pattern ''{1}''."/>
- </module>
- <module name="NoFinalizer"/>
- <module name="GenericWhitespace">
- <message key="ws.followed"
- value="GenericWhitespace ''{0}'' is followed by whitespace."/>
- <message key="ws.preceded"
- value="GenericWhitespace ''{0}'' is preceded with whitespace."/>
- <message key="ws.illegalFollow"
- value="GenericWhitespace ''{0}'' should followed by whitespace."/>
- <message key="ws.notPreceded"
- value="GenericWhitespace ''{0}'' is not preceded with whitespace."/>
- </module>
- <module name="Indentation">
- <property name="basicOffset" value="2"/>
- <property name="braceAdjustment" value="0"/>
- <property name="caseIndent" value="2"/>
- <property name="throwsIndent" value="4"/>
- <property name="lineWrappingIndentation" value="4"/>
- <property name="arrayInitIndent" value="2"/>
- </module>
- <module name="AbbreviationAsWordInName">
- <property name="ignoreFinal" value="false"/>
- <property name="allowedAbbreviationLength" value="1"/>
- </module>
- <module name="OverloadMethodsDeclarationOrder"/>
- <module name="VariableDeclarationUsageDistance"/>
- <module name="CustomImportOrder">
- <property name="specialImportsRegExp" value="org.torproject"/>
- <property name="sortImportsInGroupAlphabetically" value="true"/>
- <property name="customImportOrderRules" value="STATIC###SPECIAL_IMPORTS###THIRD_PARTY_PACKAGE###STANDARD_JAVA_PACKAGE"/>
- </module>
- <module name="MethodParamPad"/>
- <module name="OperatorWrap">
- <property name="option" value="NL"/>
- <property name="tokens" value="BAND, BOR, BSR, BXOR, DIV, EQUAL, GE, GT, LAND, LE, LITERAL_INSTANCEOF, LOR, LT, MINUS, MOD, NOT_EQUAL, PLUS, QUESTION, SL, SR, STAR "/>
- </module>
- <module name="AnnotationLocation">
- <property name="tokens" value="CLASS_DEF, INTERFACE_DEF, ENUM_DEF, METHOD_DEF, CTOR_DEF"/>
- </module>
- <module name="AnnotationLocation">
- <property name="tokens" value="VARIABLE_DEF"/>
- <property name="allowSamelineMultipleAnnotations" value="true"/>
- </module>
- <module name="NonEmptyAtclauseDescription"/>
- <module name="JavadocTagContinuationIndentation"/>
- <module name="SummaryJavadoc">
- <property name="forbiddenSummaryFragments" value="^@return the *|^This method returns |^A [{]@code [a-zA-Z0-9]+[}]( is a )"/>
- </module>
- <module name="JavadocParagraph"/>
- <module name="AtclauseOrder">
- <property name="tagOrder" value="@param, @return, @throws, @deprecated"/>
- <property name="target" value="CLASS_DEF, INTERFACE_DEF, ENUM_DEF, METHOD_DEF, CTOR_DEF, VARIABLE_DEF"/>
- </module>
- <module name="JavadocMethod">
- <property name="scope" value="public"/>
- <property name="allowMissingParamTags" value="true"/>
- <property name="allowMissingThrowsTags" value="true"/>
- <property name="allowMissingReturnTag" value="true"/>
- <property name="minLineCount" value="2"/>
- <property name="allowedAnnotations" value="Override, Test"/>
- <property name="allowThrowsTagsForSubclasses" value="true"/>
- </module>
- <module name="MethodName">
- <property name="format" value="^[a-z][a-z0-9][a-zA-Z0-9_]*$"/>
- <message key="name.invalidPattern"
- value="Method name ''{0}'' must match pattern ''{1}''."/>
- </module>
- <module name="SingleLineJavadoc">
- <property name="ignoreInlineTags" value="false"/>
- </module>
- <module name="EmptyCatchBlock">
- <property name="exceptionVariableName" value="expected"/>
- </module>
- <module name="CommentsIndentation"/>
- <module name="SuppressWarningsHolder" />
- </module>
-</module>
More information about the tor-commits
mailing list