[tor-commits] [metrics-web/master] Move contents from Statistics page to text file.
karsten at torproject.org
karsten at torproject.org
Thu Jun 26 14:48:14 UTC 2014
commit 84336ccac18d3a6347d5768e067ca5d9719d917e
Author: Karsten Loesing <karsten.loesing at gmx.net>
Date: Thu Jun 26 16:06:18 2014 +0200
Move contents from Statistics page to text file.
The Statistics page is more like a spec, and it's likely only interesting
for 5% of visitors. Let's not overwhelm the remaining 95% with something
they don't care about.
---
doc/stats-spec.txt | 264 ++++++++++++++++++++++++++++++++++++++++
website/web/WEB-INF/banner.jsp | 3 -
website/web/WEB-INF/error.jsp | 1 -
website/web/WEB-INF/stats.jsp | 6 +
4 files changed, 270 insertions(+), 4 deletions(-)
diff --git a/doc/stats-spec.txt b/doc/stats-spec.txt
new file mode 100644
index 0000000..a0c45c3
--- /dev/null
+++ b/doc/stats-spec.txt
@@ -0,0 +1,264 @@
+Statistics produced by Tor Metrics
+==================================
+
+Tor Metrics aggregates large amounts of Tor network data and visualizes
+results in customizable graphs and tables. All aggregated data are also
+available for download, so that people can easily plot their own graphs or
+even develop a prettier metrics website without writing their own data
+aggregation code. Data formats of aggregate statistics are specified
+below.
+
+Statistics files are available for download at:
+
+ https://metrics.torproject.org/stats/
+
+
+Number of relays and bridges
+----------------------------
+
+Statistics file servers.csv contains the average number of relays and
+bridges in the Tor network. All averages are calculated per day by
+evaluating the relay and bridge lists published by the directory
+authorities. Statistics include subsets of relays or bridges by relay
+flag (only relays), country code (only relays, only until February 2013),
+Tor software version (only relays), operating system (only relays), and
+EC2 cloud (only bridges). The statistics file contains the following
+columns:
+
+ - date: UTC date (YYYY-MM-DD) when relays or bridges have been listed as
+ running.
+
+ - flag: Relay flag assigned by the directory authorities. Examples are
+ "Exit", "Guard", "Fast", "Stable", and "HSDir". Relays can have none,
+ some, or all these relay flags assigned. Relays that don't have the
+ "Running" flag are not included in these statistics regardless of their
+ other flags. If this column contains the empty string, all running
+ relays are included, regardless of assigned flags. There are no
+ statistics on the number of bridges by relay flag.
+
+ - country: Two-letter lower-case country code as found in a GeoIP
+ database by resolving the relay's first onion-routing IP address, or
+ "??" if an IP addresses could not be resolved. If this column contains
+ the empty string, all running relays are included, regardless of their
+ resolved country code. Statistics on relays by country code are only
+ available until January 31, 2013. There are no statistics on the
+ number of bridges by country code.
+
+ - version: First three dotted numbers of the Tor software version as
+ reported by the relay. An example is "0.2.5". If this column contains
+ the empty string, all running relays are included, regardless of the
+ Tor software version they run. There are no statistics on the number
+ of bridges by Tor software version.
+
+ - platform: Operating system as reported by the relay. Examples are
+ "Linux", "Darwin" (Mac OS X), "FreeBSD", "Windows", and "Other". If
+ this column contains the empty string, all running relays are included,
+ regardless of the operating system they run on. There are no
+ statistics on the number of bridges by operating system.
+
+ - ec2bridge: Whether bridges are running in the EC2 cloud or not. More
+ precisely, bridges in the EC2 cloud running an image provided by Tor by
+ default set their nickname to "ec2bridger" plus 8 random hex
+ characters. This column either contains "t" for bridges matching this
+ naming scheme, or the empty string for all bridges regardless of their
+ nickname. There are no statistics on the number of relays running in
+ the EC2 cloud.
+
+ - relays: The average number of relays matching the criteria in the
+ previous columns. If the values in previous columns are specific to
+ bridges only, this column contains the empty string.
+
+ - bridges: The average number of bridges matching the criteria in the
+ previous columns. If the values in previous columns are specific to
+ relays only, this column contains the empty string.
+
+
+Bandwidth provided and consumed by relays
+-----------------------------------------
+
+Statistics on bandwidth provided and consumed by relays are contained in
+file bandwidth.csv. This file contains three different bandwidth metrics:
+(1) bandwidth that relays are capable to provide and bandwidth that relays
+report to have consumed, either (2) for any traffic, or (3) only traffic
+from serving directory data. Relays providing bandwidth statistics are
+categorized by having the "Exit" and "Guard" relay flag, having both, or
+not having either. The statistics file contains the following columns:
+
+ - date: UTC date (YYYY-MM-DD) that relays reported bandwidth data for.
+
+ - isexit: Whether relays included in this line have the "Exit" relay flag
+ or not, which can be "t" or "f". If this column contains the empty
+ string, bandwidth data from all running relays are included, regardless
+ of assigned relay flags.
+
+ - isguard: Whether relays included in this line have the "Guard" relay
+ flag or not, which can be "t" or "f". If this column contains the
+ empty string, bandwidth data from all running relays are included,
+ regardless of assigned relay flags.
+
+ - advbw: Total advertised bandwidth in bytes per second that relays are
+ capable to provide.
+
+ - bwread: Total bandwidth in bytes per second that relays have read.
+ This metric includes any kind of traffic.
+
+ - bwwrite: Similar to bwread, but for traffic written by relays.
+
+ - dirread: Bandwidth in bytes per second that relays have read when
+ serving directory data. Not all relays report how many bytes they read
+ when serving directory data which is why this value is an estimate from
+ the available data. This metric is not available for subsets of relays
+ with certain relay flags, so that this column will contain the empty
+ string if either isexit or isguard is non-empty.
+
+ - dirwrite: Similar to dirread, but for traffic written by relays when
+ serving directory data.
+
+
+Advertised bandwidth distribution and n-th fastest relays
+---------------------------------------------------------
+
+Statistics file advbwdist.csv contains statistics on the advertised
+bandwidth of relays in the network. These statistics include advertised
+bandwidth percentiles and advertised bandwidth values of the n-th fastest
+relays. The statistics file contains the following columns:
+
+ - date: UTC date (YYYY-MM-DD) when relays have been listed as running.
+
+ - isexit: Whether relays included in this line have the "Exit" relay
+ flag, which would be indicated as "t". If this column contains the
+ empty string, advertised bandwidths from all running relays are
+ included, regardless of assigned relay flags.
+
+ - relay: Position of the relay in an ordered list of all advertised
+ bandwidths, starting at 1 for the fastest relay in the network. May be
+ the empty string if this line contains advertised bandwidth by
+ percentile.
+
+ - percentile: Advertised bandwidth percentile given in this line. May be
+ the empty string if this line contains advertised bandwidth by fastest
+ relays.
+
+ - advbw: Advertised bandwidth in B/s.
+
+
+Estimated number of clients in the Tor network
+----------------------------------------------
+
+Statistics file clients.csv contains estimates on the number of clients in
+the Tor network. These estimates are based on the number of directory
+requests counted on directory mirrors and bridges. Statistics are
+available for clients connecting directly to the Tor network and clients
+connecting via bridges. For relays, there exist statistics on the number
+of clients by country, and for bridges, statistics are available by
+country, by transport, and by IP version. Statistics further include
+expected client numbers from past observations which can be used to detect
+censorship or release of censorship. The statistics file contains the
+following columns:
+
+ - date: UTC date (YYYY-MM-DD) for which client numbers are estimated.
+
+ - node: The node type to which clients connect first, which can be either
+ "relay" or "bridge".
+
+ - country: Two-letter lower-case country code as found in a GeoIP
+ database by resolving clients' IP addresses, or "??" if client IP
+ addresses could not be resolved. If this column contains the empty
+ string, all clients are included, regardless of their country code.
+
+ - transport: Transport name used by clients to connect to the Tor network
+ using bridges. Examples are "obfs2", "obfs3", "websocket", or "<OR>"
+ (original onion routing protocol). If this column contains the empty
+ string, all clients are included, regardless of their transport. There
+ are no statistics on the number of clients by transport that connect to
+ the Tor network via relays.
+
+ - version: IP version used by clients to connect to the Tor network using
+ bridges. Examples are "v4" and "v6". If this column contains the
+ empty string, all clients are included, regardless of their IP version.
+ There are no statistics on the number of clients by IP version that
+ connect directly to the Tor network using relays.
+
+ - lower: Lower number of expected clients under the assumption that there
+ has been no censorship event. If this column contains the empty
+ string, there are no expectations on the number of clients.
+
+ - upper: Upper number of expected clients under the assumption that there
+ has been no release of censorship. If this column contains the empty
+ string, there are no expectations on the number of clients.
+
+ - clients: Estimated number of clients.
+
+ - frac: Fraction of relays or bridges in percent that the estimate is
+ based on. The higher this value, the more reliable is the estimate.
+ Values above 50 can be considered reliable enough for most purposes,
+ lower values should be handled with more care.
+
+
+Performance of downloading static files over Tor
+------------------------------------------------
+
+Statistics file torperf.csv contains aggregate statistics on download
+performance over time. These statistics come from the Torperf service
+that periodically downloads static files over Tor. The statistics file
+contains the following columns:
+
+ - date: UTC date (YYYY-MM-DD) when download performance was measured.
+
+ - size: Size of the downloaded file in bytes.
+
+ - source: Name of the Torperf service performing measurements. If this
+ column contains the empty string, all measurements are included,
+ regardless of which Torperf service performed them. Examples are
+ "moria", "siv", and "torperf".
+
+ - q1: First quartile of time until receiving the last byte in
+ milliseconds.
+
+ - md: Median of time until receiving the last byte in milliseconds.
+
+ - q3: Third quartile of time until receiving the last byte in
+ milliseconds.
+
+ - timeouts: Number of timeouts that occurred when attempting to download
+ the static file over Tor.
+
+ - failures: Number of failures that occurred when attempting to download
+ the static file over Tor.
+
+ - requests: Total number of requests made to download the static file
+ over Tor.
+
+
+Fraction of connections used uni-/bidirectionally
+-------------------------------------------------
+
+Statistics file connbidirect.csv contains statistics on the fraction of
+connections that is used uni- or bidirectionally. Every 10 seconds,
+relays determine for every connection whether they read and wrote less
+than a threshold of 20 KiB. For the remaining connections, relays report
+whether they read/wrote at least 10 times as many bytes as they
+wrote/read. If so, they classify a connection as "mostly reading" or
+"mostly writing," respectively. All other connections are classified as
+"both reading and writing." After classifying connections, read and write
+counters are reset for the next 10-second interval. Statistics are
+aggregated over 24 hours. The statistics file contains the following
+columns:
+
+ - date: UTC date (YYYY-MM-DD) for which statistics on uni-/bidirectional
+ connection usage were reported.
+
+ - source: Fingerprint of the relay reporting statistics.
+
+ - below: Number of 10-second intervals of connections with less than
+ 20 KiB read and written data.
+
+ - read: Number of 10-second intervals of connections with 10 times as
+ many read bytes as written bytes.
+
+ - write: Number of 10-second intervals of connections with 10 times as
+ many written bytes as read bytes.
+
+ - both: Number of 10-second intervals of connections with less than
+ 10 times as many written or read bytes as in the other direction.
+
diff --git a/website/web/WEB-INF/banner.jsp b/website/web/WEB-INF/banner.jsp
index 2b27632..3a3cf5d 100644
--- a/website/web/WEB-INF/banner.jsp
+++ b/website/web/WEB-INF/banner.jsp
@@ -20,9 +20,6 @@
<a <% if (currentPage.endsWith("performance.jsp")) {
%>class="current"<%} else {%>href="/performance.html"<%}
%>>Performance</a>
- <a <% if (currentPage.endsWith("stats.jsp")) {
- %>class="current"<%} else {%>href="/stats.html"<%}
- %>>Statistics</a>
</td>
<td class="banner-right"></td>
</tr>
diff --git a/website/web/WEB-INF/error.jsp b/website/web/WEB-INF/error.jsp
index e6f1e71..bd6d442 100644
--- a/website/web/WEB-INF/error.jsp
+++ b/website/web/WEB-INF/error.jsp
@@ -46,7 +46,6 @@ Maybe you find what you're looking for on our sitemap:
<li><a href="bubbles.html">Diversity</a></li>
<li><a href="users.html">Users</a></li>
<li><a href="performance.html">Performance</a></li>
-<li><a href="stats.html">Statistics</a></li>
</ul>
</p>
diff --git a/website/web/WEB-INF/stats.jsp b/website/web/WEB-INF/stats.jsp
index d708910..005235e 100644
--- a/website/web/WEB-INF/stats.jsp
+++ b/website/web/WEB-INF/stats.jsp
@@ -13,6 +13,12 @@
<h2>Tor Metrics: Statistics</h2>
<br>
+<p><font color="red"><b>Notice:</b> The specification on this page has
+moved
+<a href="https://gitweb.torproject.org/metrics-web.git/blob/HEAD:/doc/stats-spec.txt">here</a>.
+This page will be removed after July 26, 2014.</font>
+</p>
+
<p>Tor Metrics aggregates large amounts of Tor network
<a href="data.html">data</a> and visualizes results in customizable
<a href="graphs.html">graphs</a> and tables.
More information about the tor-commits
mailing list