[tor-commits] [metrics-db/master] Extend new CollecTor homepage.
karsten at torproject.org
karsten at torproject.org
Wed Jun 4 15:20:40 UTC 2014
commit 447e394f0c69124ee8254f55e3762e1b1e4efd73
Author: Karsten Loesing <karsten.loesing at gmx.net>
Date: Wed Jun 4 17:19:42 2014 +0200
Extend new CollecTor homepage.
At this point, everything relevant from the metrics website should be
included.
---
web/css/style.css | 31 +--
web/formats.html | 551 +++++++++++++++++++++++++++++++++++++++++++++++++++++
web/index.html | 151 +++++++++++----
3 files changed, 673 insertions(+), 60 deletions(-)
diff --git a/web/css/style.css b/web/css/style.css
index 9978a5d..344d8e6 100644
--- a/web/css/style.css
+++ b/web/css/style.css
@@ -1,38 +1,13 @@
body { font-family: "Open Sans","lucida grande","Segoe UI",arial,verdana,
"lucida sans unicode",tahoma,sans-serif; background: #fafafa;
font-size: 13px; line-height: 22px; color: #222; }
-h1 { font-size: 20px; font-weight: normal; text-align: center; }
-h3 { color: #7D4698; position: relative }
a { color: #7D4698; text-decoration: none; font-weight: bold; }
-ul { list-style: none; padding: 0; margin: 0; }
p { margin: 0; padding: 10px; }
a[name] { padding: 0; margin: 0; }
.box { max-width: 850px; width: 100%; margin: 0 auto 30px auto;
padding-bottom: 30px; background: white; border: 1px solid #eee; }
.box > * { margin-left: 30px; margin-right: 30px; }
-.box h3 a { visibility: hidden; }
-.box:hover h3 a { visibility: visible; }
-.api-request { border-bottom: 1px solid #eee; position: relative }
-.request-url, .request-type, .request-response { padding: 8px 10px;
- vertical-align: middle }
-.request-type { color: #57145F; display: inline-block; }
-.request-url { color: #333; font-size: 18px; }
-.request-response { position: absolute; color: #666; right: 0; }
-h3 .request-response { padding: 0 !important; }
-.api-urls>li:last-child { border-bottom: 0; }
-.required-true, .required-false, .typeof { display: inline-block;
- vertical-align: middle; padding: 5px 10px; }
-.required-true { color: #1d7508; }
-.required-false { color: #aaa; }
-.properties { margin-top: 10px; margin-bottom: 10px;
- border: 1px solid #eee; }
-.properties li { padding: 5px 0; }
-.properties li ul { border: 1px solid #eee; margin: 10px 10px 10px 40px;
- background: white; }
-.properties .properties { margin-left: 10px; }
-.properties li:nth-child(even) { background: #fafafa; }
-.properties p { padding: 10px 15px; }
-.properties b { padding: 5px 10px; display: inline-block;
- vertical-align: middle; }
-.api-urls{ margin-top: 30px; margin-bottom: 30px; }
+.box h2 a { visibility: hidden; }
+.box:hover h2 a { visibility: visible; }
+h3 .type-annotation { float: right; color: #666; }
diff --git a/web/formats.html b/web/formats.html
new file mode 100644
index 0000000..68fddee
--- /dev/null
+++ b/web/formats.html
@@ -0,0 +1,551 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
+<html>
+<head>
+<title>CollecTor — What is in the data?</title>
+<link href="css/style.css" type="text/css" rel="stylesheet">
+<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
+<link href="favicon.ico" type="image/x-icon" rel="shortcut icon">
+</head>
+<body>
+
+<div class="box">
+
+<h1><a href="index.html">CollecTor</a> —</h1>
+<h2>What is in the data?</h2>
+
+<p>
+The Tor network data provided here comes from five different sources which
+are explained in more detail on this page.
+You may either read through the entire page or jump to the type of data
+you're most interested in:
+
+<ul>
+<li><a href="#relay-descriptors">Tor relay descriptors</a></li>
+<li><a href="#bridge-descriptors">Tor bridge descriptors</a></li>
+<li><a href="#bridge-pool-assignments">BridgeDB's bridge pool
+assignments</a></li>
+<li><a href="#exit-lists">TorDNSEL's exit lists</a></li>
+<li><a href="#torperf">Torperf's performance data</a></li>
+</ul>
+
+<p>
+Each descriptor provided here contains an <tt>@type</tt> annotation using
+the format <tt>@type $descriptortype $major.$minor</tt>.
+Any tool that processes these descriptors may parse files without meta
+data or with an unknown descriptor type at its own risk, can safely parse
+files with known descriptor type and same major version number, and should
+not parse files with known descriptor type and higher major version
+number.
+</p>
+
+</div> <!-- box -->
+
+<div class="box">
+
+<a name="relay-descriptors"></a>
+<h2>Tor relay descriptors <a href="#relay-descriptors">#</a></h2>
+
+<p>
+Relays and directory authorities publish relay descriptors, so that
+clients can select relays for their paths through the Tor network.
+All these relay descriptors are specified in the
+<a href="https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec.txt">Tor
+directory protocol, version 3</a> specification document (or in the
+earlier protocol versions
+<a href="https://gitweb.torproject.org/torspec.git/blob/HEAD:/dir-spec-v2.txt">2</a> or
+<a href="https://gitweb.torproject.org/torspec.git/blob/HEAD:/attic/dir-spec-v1.txt">1</a>).
+This page shall give a quick overview of what relay descriptors are
+available.
+</p>
+
+<h3>Server descriptors
+(<a href="archive/relay-descriptors/server-descriptors/">archive</a>,
+<a href="recent/relay-descriptors/server-descriptors/">recent</a>)
+<span class="type-annotation"><tt>@type server-descriptor 1.0</tt></span>
+</h3>
+
+<p>
+Server descriptors contain information that relays publish about
+themselves.
+Tor clients once downloaded this information, but now they use
+microdescriptors instead.
+The server descriptors in
+<a href="archive/relay-descriptors/server-descriptors/">archive</a>
+contain one descriptor per file, whereas the files in
+<a href="recent/relay-descriptors/server-descriptors/">recent</a>
+contain all descriptors collected in an hour concatenated into a single
+file.
+</p>
+
+<h3>Extra-info descriptors
+(<a href="archive/relay-descriptors/extra-infos/">archive</a>,
+<a href="recent/relay-descriptors/extra-infos/">recent</a>)
+<span class="type-annotation"><tt>@type extra-info 1.0</tt></span>
+</h3>
+
+<p>
+Extra-info descriptors contain relay information that Tor clients do not
+need in order to function.
+This is self-published, like server descriptors, but not downloaded by
+clients by default.
+The extra-info descriptors in
+<a href="archive/relay-descriptors/extra-infos/">archive</a>
+contain one descriptor per file, whereas the files in
+<a href="recent/relay-descriptors/extra-infos/">recent</a>
+contain all descriptors collected in an hour concatenated into a single
+file.
+</p>
+
+<h3>Network status consensuses
+(<a href="archive/relay-descriptors/consensuses/">archive</a>,
+<a href="recent/relay-descriptors/consensuses/">recent</a>)
+<span class="type-annotation"><tt>@type network-status-consensus-3
+1.0</tt></span>
+</h3>
+
+<p>
+Though Tor relays are decentralized, the directories that track the
+overall network are not.
+These central points are called directory authorities, and every hour they
+publish a document called a consensus, or network status document.
+The consensus in turn is made up of router status entries containing
+flags, heuristics used for relay selection, etc.
+</p>
+
+<h3>Network status votes
+(<a href="archive/relay-descriptors/votes/">archive</a>,
+<a href="recent/relay-descriptors/votes/">recent</a>)
+<span class="type-annotation"><tt>@type network-status-vote-3
+1.0</tt></span>
+</h3>
+
+<p>
+The directory authorities exchange votes every hour to come up with a
+common consensus.
+Vote documents are by far the largest documents provided here.
+</p>
+
+<h3>Directory key certificates
+(<a href="archive/relay-descriptors/certs.xz">archive</a>)
+<span class="type-annotation"><tt>@type dir-key-certificate-3
+1.0</tt></span>
+</h3>
+
+<p>
+The directory authorities sign their votes and the consensus with their
+key that they publish in a key certificate.
+These key certificates change once every few months, so they are only
+available in the
+<a href="archive/relay-descriptors/certs.xz">archive</a>.
+</p>
+
+<h3>Microdescriptor consensuses
+(<a href="archive/relay-descriptors/microdescs/">archive</a>,
+<a href="recent/relay-descriptors/microdescs/">recent</a>)
+<span class="type-annotation"><tt>@type
+network-status-microdesc-consensus-3 1.0</tt></span>
+</h3>
+
+<p>
+Tor clients used to download all server descriptors of active relays, but
+now they only download the smaller microdescriptors which are derived from
+server descriptors.
+The microdescriptor consensus lists all active relays and references their
+currently used microdescriptor.
+The tarballs in
+<a href="archive/relay-descriptors/microdescs/">archive</a>
+contain both microdescriptor consensuses and referenced microdescriptors
+together.
+</p>
+
+<h3>Microdescriptors
+(<a href="archive/relay-descriptors/microdescs/">archive</a>,
+<a href="recent/relay-descriptors/microdescs/">recent</a>)
+<span class="type-annotation"><tt>@type microdescriptor 1.0</tt></span>
+</h3>
+
+<p>
+Microdescriptors are minimalistic documents that just includes the
+information necessary for Tor clients to work.
+The tarballs in
+<a href="archive/relay-descriptors/microdescs/">archive</a>
+contain both microdescriptor consensuses and referenced microdescriptors
+together.
+The microdescriptors in
+<a href="archive/relay-descriptors/microdescs/">archive</a>
+contain one descriptor per file, whereas the files in
+<a href="recent/relay-descriptors/microdescs/">recent</a>
+contain all descriptors collected in an hour concatenated into a single
+file.
+</p>
+
+<h3>Version 2 network statuses
+(<a href="archive/relay-descriptors/statuses/">archive</a>)
+<span class="type-annotation"><tt>@type network-status-2 1.0</tt></span>
+</h3>
+
+<p>
+Version 2 network statuses have been published by the directory
+authorities before consensuses have been introduced.
+In contrast to consensuses, each directory authority published their own
+authoritative view on the network, and clients combined these documents
+locally.
+We stopped archiving version 2 network statuses in 2012.
+</p>
+
+<h3>Version 1 directories
+(<a href="archive/relay-descriptors/tor/">archive</a>)
+<span class="type-annotation"><tt>@type directory 1.0</tt></span>
+</h3>
+
+<p>
+The first directory protocol version combined the list of active relays
+with server descriptors in a single directory document.
+We stopped archiving version 1 directories in 2007.
+</p>
+
+</div> <!-- box -->
+
+<div class="box">
+
+<a name="bridge-descriptors"></a>
+<h2>Tor bridge descriptors <a href="#bridge-descriptors">#</a></h2>
+
+<p>
+Bridges and the bridge authority publish bridge descriptors that are used
+by censored clients to connect to the Tor network.
+We cannot, however, make bridge descriptors available as we do with relay
+descriptors, because that would defeat the purpose of making bridges hard
+to enumerate for censors.
+We therefore sanitize bridge descriptors by removing all potentially
+identifying information and publish sanitized versions here.
+The sanitizing steps are as follows:
+</p>
+
+<ol>
+<li><b>Replace the bridge identity with its SHA1 value:</b> Clients
+can request a bridge's current descriptor by sending its identity string
+to the bridge authority.
+This is a feature to make bridges on dynamic IP addresses useful.
+Therefore, the original identities (and anything that could be used to
+derive them) need to be removed from the descriptors.
+The bridge identity is replaced with its SHA1 hash value.
+The idea is to have a consistent replacement that remains stable over
+months or even years (without keeping a secret for a keyed hash
+function).</li>
+<li><b>Remove all cryptographic keys and signatures:</b> It would be
+straightforward to learn about the bridge identity from the bridge's
+public key.
+Replacing keys by newly generated ones seemed to be unnecessary (and would
+involve keeping a state over months/years), so that all cryptographic
+objects have simply been removed.</li>
+<li><b>Replace IP address with IP address hash:</b> Of course, IP
+addresses need to be sanitized, too.
+<ul><li>IPv4 addresses are replaced with <tt>10.x.x.x</tt> with
+<tt>x.x.x</tt> being the 3 byte output of
+<tt>H(IP address | bridge identity | secret)[:3]</tt>.
+The input <tt>IP address</tt> is the 4-byte long binary representation of
+the bridge's current IP address.
+The <tt>bridge identity</tt> is the 20-byte long binary representation of
+the bridge's long-term identity fingerprint.
+The <tt>secret</tt> is a 31-byte long secure random string that changes
+once per month for all descriptors and statuses published in that month.
+<tt>H()</tt> is SHA-256.
+The <tt>[:3]</tt> operator means that we pick the 3 most significant bytes
+of the result.</li>
+<li>IPv6 addresses are replaced with <tt>[fd9f:2e19:3bcf::xx:xxxx]</tt>
+with <tt>xx:xxxx</tt> being the hex-formatted 3 byte output of a similar
+hash function as described for IPv4 addresses.
+The only differences are that the input <tt>IP address</tt> is 16 bytes
+long and the <tt>secret</tt> is only 19 bytes long.</li></ul>
+<li><b>Replace contact information:</b> If there is contact information in
+a descriptor, the contact line is changed to
+<tt>somebody</tt>.</li>
+<li><b>Remove pluggable transport addresses and arguments:</b> Bridges may
+provide transports in addition to the onion-routing protocol and include
+information about these transports in their extra-info descriptors for
+BridgeDB.
+In that case, any IP addresses, TCP ports, or additional arguments are
+removed, only leaving in the supported transport names.</li>
+<li><b>Append descriptor digest:</b> Descriptors are often referenced by
+their digest, but that is not possible anymore once their content is
+changed.
+As a workaround, sanitized descriptors may contain a new line
+<tt>router-digest</tt> with the hex representation of the SHA-1 of the
+original descriptor digest.
+</ol>
+
+<h3>Network statuses
+(<a href="archive/bridge-descriptors/">archive</a>,
+<a href="recent/bridge-descriptors/statuses/">recent</a>)
+<span class="type-annotation"><tt>@type bridge-network-status
+1.0</tt></span>
+</h3>
+
+<p>
+Sanitized bridge network statuses are similar to version 2 relay network
+statuses, but with only a <tt>published</tt> line in the header and
+without any lines in the footer.
+The tarballs in
+<a href="archive/bridge-descriptors/">archive</a> contain all bridge
+descriptors of a given month, not just network statuses.
+</p>
+
+<h3>Server descriptors
+(<a href="archive/bridge-descriptors/">archive</a>,
+<a href="recent/bridge-descriptors/server-descriptors/">recent</a>)
+<span class="type-annotation"><tt>@type bridge-server-descriptor
+1.0</tt></span>
+</h3>
+
+<p>
+Bridge server descriptors follow the same format as relay server
+descriptors, except for the sanitizing steps described above.
+The tarballs in
+<a href="archive/bridge-descriptors/">archive</a> contain all bridge
+descriptors of a given month, not just server descriptors.
+These tarballs contain one descriptor per file, whereas the
+files in
+<a href="recent/bridge-descriptors/server-descriptors/">recent</a>
+contain all descriptors collected in an hour concatenated into a single
+file to reduce the number of files.
+</p>
+
+<h3>Extra-info descriptors
+(<a href="archive/bridge-descriptors/">archive</a>,
+<a href="recent/bridge-descriptors/extra-infos/">recent</a>)
+<span class="type-annotation"><tt>@type bridge-extra-info 1.2</tt></span>
+</h3>
+
+<p>
+Bridge server descriptors follow the same format as relay server
+descriptors, except for the sanitizing steps described above.
+The format has changed over time to accomodate changes to the sanitizing
+process, with earlier versions being:
+</p>
+
+<ul>
+<li><font color="#666"><tt>@type bridge-extra-info 1.0</tt> was the first
+version.</font></li>
+<li><font color="#666"><tt>@type bridge-extra-info 1.1</tt> added
+sanitized <tt>transport</tt> lines</font>.</li>
+<li><tt>@type bridge-extra-info 1.2</tt> added <tt>ntor-onion-key</tt>
+lines.</li>
+</ul>
+
+<p>
+The tarballs in
+<a href="archive/bridge-descriptors/">archive</a> contain all bridge
+descriptors of a given month, not just extra-info descriptors.
+These tarballs contain one descriptor per file, whereas the
+files in
+<a href="recent/bridge-descriptors/extra-infos/">recent</a>
+contain all descriptors collected in an hour concatenated into a single
+file to reduce the number of files.
+</p>
+
+</div> <!-- box -->
+
+<div class="box">
+
+<a name="bridge-pool-assignments"></a>
+<h2>BridgeDB's bridge pool assignments
+<a href="#bridge-pool-assignments">#</a></h2>
+
+<p>
+The bridge distribution service BridgeDB publishes bridge pool assignments
+describing which bridges it has assigned to which distribution pool.
+BridgeDB receives bridge network statuses from the bridge authority,
+assigns these bridges to persistent distribution rings, and hands them out
+to bridge users.
+BridgeDB periodically dumps the list of running bridges with information
+about the rings, subrings, and file buckets to which they are assigned to
+a local file.
+The sanitized versions of these lists containing SHA-1 hashes of bridge
+fingerprints instead of the original fingerprints are available for
+statistical analysis.
+</p>
+
+<h3>Bridge pool assignments
+(<a href="archive/bridge-pool-assignments/">archive</a>,
+<a href="recent/bridge-pool-assignments/">recent</a>)
+<span class="type-annotation"><tt>@type bridge-pool-assignment
+1.0</tt></span>
+</h3>
+
+<p>
+The document below shows a BridgeDB pool assignment file
+from March 13, 2011.
+Every such file begins with a line containing the timestamp when BridgeDB
+wrote this file.
+Subsequent lines start with the SHA-1 hash of a bridge fingerprint,
+followed by ring, subring, and/or file bucket information.
+There are currently three distributor ring types in BridgeDB:
+</p>
+
+<ol>
+<li><b>unallocated:</b> These bridges are not distributed by BridgeDB,
+but are either reserved for manual distribution or are written to file
+buckets for distribution via an external tool.
+If a bridge in the <tt>unallocated</tt> ring is assigned to a file bucket,
+this is noted by <tt>bucket=$bucketname</tt>.</li>
+<li><b>email:</b> These bridges are distributed via an e-mail
+autoresponder. Bridges can be assigned to subrings by their OR port or
+relay flag which is defined by <tt>port=$port</tt> and/or <tt>flag=$flag</tt>.
+</li>
+<li><b>https:</b> These bridges are distributed via https server.
+There are multiple https rings to further distribute bridges by IP address
+ranges, which is denoted by <tt>ring=$ring</tt>.
+Bridges in the <tt>https</tt> ring can also be assigned to subrings by
+OR port or relay flag which is defined by <tt>port=$port</tt> and/or
+<tt>flag=$flag</tt>.</li>
+</ol>
+
+<pre>
+bridge-pool-assignment 2011-03-13 14:38:03
+00b834117566035736fc6bd4ece950eace8e057a unallocated
+00e923e7a8d87d28954fee7503e480f3a03ce4ee email port=443 flag=stable
+0103bb5b00ad3102b2dbafe9ce709a0a7c1060e4 https ring=2 port=443 flag=stable
+[...]
+</pre>
+
+</div> <!-- box -->
+
+<div class="box">
+
+<a name="exit-lists"></a>
+<h2>TorDNSEL's exit lists <a href="#exit-lists">#</a></h2>
+
+<p>
+The exit list service
+<a href="https://www.torproject.org/tordnsel/dist/">TorDNSEL</a>
+publishes exit lists containing the IP addresses of relays that it found
+when exiting through them.
+</p>
+
+<h3>Exit lists
+(<a href="archive/exit-lists/">archive</a>,
+<a href="recent/exit-lists/">recent</a>)
+<span class="type-annotation"><tt>@type tordnsel 1.0</tt></span>
+</h3>
+
+<p>
+Tor Check makes the list of known exits and corresponding exit IP
+addresses available in a specific format.
+The document below shows an entry of the exit list written on
+December 28, 2010 at 15:21:44 UTC.
+This entry means that the relay with fingerprint <tt>63BA..</tt> which
+published a descriptor at 07:35:55 and was contained in a version 2
+network status from 08:10:11 uses two different IP addresses for exiting.
+The first address <tt>91.102.152.236</tt> was found in a test performed at
+07:10:30.
+When looking at the corresponding server descriptor, one finds that this
+is also the IP address on which the relay accepts connections from inside
+the Tor network.
+A second test performed at 10:35:30 reveals that the relay also uses IP
+address <tt>91.102.152.227</tt> for exiting.
+</p>
+
+<pre>
+ExitNode 63BA28370F543D175173E414D5450590D73E22DC
+Published 2010-12-28 07:35:55
+LastStatus 2010-12-28 08:10:11
+ExitAddress 91.102.152.236 2010-12-28 07:10:30
+ExitAddress 91.102.152.227 2010-12-28 10:35:30
+</pre>
+
+</div> <!-- box -->
+
+<div class="box">
+
+<a name="torperf"></a>
+<h2>Torperf's performance data <a href="#torperf">#</a></h2>
+
+<p>
+The performance measurement service Torperf publishes performance data
+from making simple HTTP requests over the Tor network.
+Torperf uses a trivial SOCKS client to download files of various sizes
+over the Tor network and notes how long substeps take.
+</p>
+
+<h3>Torperf measurement results
+(<a href="archive/torperf/">archive</a>,
+<a href="recent/torperf/">recent</a>)
+<span class="type-annotation"><tt>@type torperf 1.0</tt></span>
+</h3>
+
+<p>
+A Torperf results file contains a single line per Torperf run with
+<tt>key=value</tt> pairs.
+Such a result line is sufficient to learn about 1) the Tor and Torperf
+configuration, 2) measurement results, and 3) additional information that
+might help explain the results.
+Known keys are explained below.
+</p>
+<ul>
+<li>Configuration
+<ul>
+<li><tt>SOURCE:</tt> Configured name of the data source; required.</li>
+<li><tt>FILESIZE:</tt> Configured file size in bytes; required.</li>
+<li>Other meta data describing the Tor or Torperf configuration, e.g.,
+GUARD for a custom guard choice; optional.</li>
+</ul>
+<li>Measurement results
+<ul>
+<li><tt>START:</tt> Time when the connection process starts;
+required.</li>
+<li><tt>SOCKET:</tt> Time when the socket was created; required.</li>
+<li><tt>CONNECT:</tt> Time when the socket was connected; required.</li>
+<li><tt>NEGOTIATE:</tt> Time when SOCKS 5 authentication methods have been
+negotiated; required.</li>
+<li><tt>REQUEST:</tt> Time when the SOCKS request was sent; required.</li>
+<li><tt>RESPONSE:</tt> Time when the SOCKS response was received;
+required.</li>
+<li><tt>DATAREQUEST:</tt> Time when the HTTP request was written;
+required.</li>
+<li><tt>DATARESPONSE:</tt> Time when the first response was received;
+required.</li>
+<li><tt>DATACOMPLETE:</tt> Time when the payload was complete;
+required.</li>
+<li><tt>WRITEBYTES:</tt> Total number of bytes written; required.</li>
+<li><tt>READBYTES:</tt> Total number of bytes read; required.</li>
+<li><tt>DIDTIMEOUT:</tt> 1 if the request timed out, 0 otherwise;
+optional.</li>
+<li><tt>DATAPERCx:</tt> Time when x% of expected bytes were read for
+x = { 10, 20, 30, 40, 50, 60, 70, 80, 90 }; optional.</li>
+<li>Other measurement results, e.g., START_RENDCIRC, GOT_INTROCIRC, etc.
+for hidden-service measurements; optional.</li>
+</ul>
+<li>Additional information
+<ul>
+<li><tt>LAUNCH:</tt> Time when the circuit was launched; optional.</li>
+<li><tt>USED_AT:</tt> Time when this circuit was used; optional.</li>
+<li><tt>PATH:</tt> List of relays in the circuit, separated by commas;
+optional.</li>
+<li><tt>BUILDTIMES:</tt> List of times when circuit hops were built,
+separated by commas; optional.</li>
+<li><tt>TIMEOUT:</tt> Circuit build timeout that the Tor client used when
+building this circuit; optional.</li>
+<li><tt>QUANTILE:</tt> Circuit build time quantile that the Tor client
+uses to determine its circuit-build timeout; optional.</li>
+<li><tt>CIRC_ID:</tt> Circuit identifier of the circuit used for this
+measurement; optional.</li>
+<li><tt>USED_BY:</tt> Stream identifier of the stream used for this
+measurement; optional.</li>
+<li>Other fields containing additional information; optional.</li>
+</ul>
+</ul>
+
+<p>
+The files in <a href="recent/torperf/extra-infos/">recent</a>
+accumulate all new Torperf measurements of a given day, which means that
+they may change throughout the day.
+This is different from all other files in the <a href="recent/">recent</a>
+directory which do not change once they are written.
+</p>
+
+</div> <!-- box -->
+
+</body>
+</html>
+
diff --git a/web/index.html b/web/index.html
index c687798..e4eadc2 100644
--- a/web/index.html
+++ b/web/index.html
@@ -1,7 +1,7 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
-<title>CollecTor — your friendly data-collecting service in the Tor
+<title>CollecTor — Your friendly data-collecting service in the Tor
network</title>
<link href="css/style.css" type="text/css" rel="stylesheet">
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
@@ -11,8 +11,8 @@ network</title>
<div class="box">
-<h1>CollecTor — your friendly data-collecting service in the Tor
-network</h1>
+<h1><a href="index.html">CollecTor</a> —</h1>
+<h2>Your friendly data-collecting service in the Tor network</h2>
<p>
Welcome to CollecTor, your friendly data-collecting service in the Tor
@@ -23,12 +23,51 @@ If you're doing research on the Tor network, or if you're developing an
application that uses Tor network data, this is your place to start.
</p>
+<ul>
+<li><a href="#formats">What is in the data?</a></li>
+<li><a href="#download">Where do I get the data?</a></li>
+<li><a href="#libraries">How can I parse the data?</a></li>
+<li><a href="#references">What did others do with the data?</a></li>
+<li><a href="#support">How can I get support?</a></li>
+</ul>
+
+</div> <!-- box -->
+
+<div class="box">
+
+<a name="formats"></a>
+<h2>What is in the data? <a href="#formats">#</a></h2>
+
+<p>
+The Tor network data provided here comes from currently five different
+sources (each of which is explained in more detail on a
+<a href="formats.html">separate page</a>):
+</p>
+
+<ol>
+<li>Relays and directory authorities publish
+<a href="formats.html#relay-descriptors">relay descriptors</a>, so that
+clients can select relays for their paths through the Tor network.</li>
+<li>Bridges and the bridge authority publish
+<a href="formats.html#bridge-descriptors">bridge descriptors</a> that are
+used by censored clients to connect to the Tor network.</li>
+<li>The bridge distribution service BridgeDB publishes
+<a href="formats.html#bridge-pool-assignments">bridge pool assignments</a>
+describing which bridges it has assigned to which distribution pool.</li>
+<li>The exit list service TorDNSEL publishes
+<a href="formats.html#exit-lists">exit lists</a> containing the IP
+addresses of relays that it found when exiting through them.</li>
+<li>The performance measurement service Torperf publishes
+<a href="formats.html#torperf">performance data</a> from making simple
+HTTP requests over the Tor network.</li>
+</ol>
+
</div> <!-- box -->
<div class="box">
-<a name="archive"></a>
-<h3>Archive of monthly tarballs <a href="#archive">#</a></h3>
+<a name="download"></a>
+<h2>Where do I get the data? <a href="#download">#</a></h2>
<p>
We have over 10 years of Tor network data available for download in
@@ -36,56 +75,104 @@ monthly tarballs.
The latest tarballs are updated every few days.
So, if you want to fetch data covering an extended period of time, monthly
tarballs are for you.
-Note that tarballs can decompress to 20 times the compressed size or even
-more.
+Just be careful: these tarballs can decompress to 20 times the compressed
+size or even more.
+Monthly tarballs can be browsed and downloaded in the
+<a href="archive/"><tt>archive/</tt></a> subdirectory.
</p>
<p>
-Monthly tarballs can be browsed and downloaded here:
+If you're only interested in recently published data, we also have data
+from the last 72 hours available for you.
+In contrast to monthly tarballs, this data set is updated every hour.
+If you have already bootstrapped your application with monthly tarballs
+and want to stay up-to-date, or if you just want to take a peak at the
+latest data, this is your data set.
+If you're using special software to download these files, you may want to
+configure it to accept gzip-compressed data to save us all some bandwidth.
+The latest 72 hours of data are available in the
+<a href="recent/"><tt>recent/</tt></a> subdirectory.
</p>
-<pre>
- <a href="archive/">https://collector.torproject.org/archive/</a>
-</pre>
-
</div> <!-- box -->
<div class="box">
-<a name="recent"></a>
-<h3>The latest 72 hours <a href="#recent">#</a></h3>
+<a name="libraries"></a>
+<h2>How can I parse the data? <a href="#libraries">#</a></h2>
<p>
-If you're only interested in recently published data, we also have data
-from the last 72 hours available for you.
-In contrast to monthly tarballs, this data set is updated every hour.
-If you have already bootstrapped your application with monthly tarballs
-and want to stay up-to-date, or if you just want to take a peak at the
-latest data, this is your data set.
+We developed two parsing libraries, one for Java and one for Python:
</p>
+<ul>
+<li>If you're programming in Java, try out the
+<a href="https://gitweb.torproject.org/metrics-lib.git">metrics-lib</a>
+library.</li>
+<li>If you're writing in Python,
+<a href="https://stem.torproject.org/">Stem</a> is your library.</li>
+</ul>
+
<p>
-The latest 72 hours of data are also available here:
+If you developed a parsing library for another language and want it to be
+listed here, <a href="#support">please let us know</a>!</h2>
</p>
-<pre>
- <a href="recent/">https://collector.torproject.org/recent/</a>
-</pre>
+</div> <!-- box -->
+
+<div class="box">
+
+<a name="references"></a>
+<h2>What did others do with the data? <a href="#references">#</a></h2>
+
+<p>
+We wrote a couple of applications, and researchers wrote research papers
+using the Tor network data provided here.
+The following list is not at all exhaustive:
+</p>
+
+<ul>
+<li>The metrics portal shows graphs of
+<a href="https://metrics.torproject.org/network.html">network growth over
+time</a> and <a href="https://metrics.torproject.org/users.html">estimates
+of users derived from directory activity</a>.</li>
+<li>The <a href="https://exonerator.torproject.org/">ExoneraTor
+service</a> allows people to look up whether a given IP address was part
+of the Tor network in the past.</li>
+<li>The websites <a href="https://atlas.torproject.org/">Atlas</a>,
+<a href="https://globe.torproject.org/">Globe</a>, and
+<a href="https://compass.torproject.org/">Compass</a> let users explore
+how specific relays or bridges contribute to the Tor network.
+They all use <a href="https://onionoo.torproject.org/">Onionoo</a> as
+their data back-end service which in turn uses the Tor network data
+provided here.</li>
+<li>The <a href="https://shadow.github.io/">Shadow Simulator</a> uses
+archived Tor directory data to generate network topologies that match the
+real Tor network as close as possible.</li>
+<li>The <a href="https://torps.github.io/">Tor Path Simulator</a> uses Tor
+directory archive data to simulate the effect of changes to Tor's path
+selection algorithm.</li>
+</ul>
+
+<p>
+If you wrote an application or research paper that uses Tor network data
+and that is not yet listed here, <a href="#support">please let us
+know</a>!</h2>
+Please include a short description what your application does or what your
+research was about.
+</p>
</div> <!-- box -->
<div class="box">
-<a name="next"></a>
-<h3>What's next? <a href="#next">#</a></h3>
+<a name="support"></a>
+<h2>How can I get support? <a href="#support">#</a></h2>
<p>
-Do you need support?
-If you have any questions or feedback about the Tor network data provided
-here, we'd like to hear from you!
-Please send mail to the
-<a href="mailto:tor-dev at lists.torproject.org">Tor development mailing
-list</a>.
+If you have any questions about the Tor network data provided here, we'd
+like to <a href="mailto:help at rt.torproject.org">hear from you</a>!
+Of course, suggestions or other feedback are welcome, too.
</p>
</div>
More information about the tor-commits
mailing list