[tor-commits] [torspec/master] Bandwidth-measurement file specification, as sent to tor-dev

nickm at torproject.org nickm at torproject.org
Mon May 14 18:31:26 UTC 2018


commit 9465f9c0713b2185643a976ef14e0959d3885e80
Author: juga0 <juga at riseup.net>
Date:   Mon May 14 14:31:05 2018 -0400

    Bandwidth-measurement file specification, as sent to tor-dev
---
 bandwidth-file-spec.txt | 412 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 412 insertions(+)

diff --git a/bandwidth-file-spec.txt b/bandwidth-file-spec.txt
new file mode 100644
index 0000000..0755329
--- /dev/null
+++ b/bandwidth-file-spec.txt
@@ -0,0 +1,412 @@
+                  Tor Bandwidth List Format
+                            juga
+                            teor
+
+1. Scope and preliminaries
+
+  This document describes the format of Tor's Bandwidth List,
+  version 1.0.0, 1.1.0 and later.
+  It is new specification for the existing format 1.0.0.
+  Describes a new format 1.1.0, which is backwards compatible with
+  1.0.0 parsers.
+
+  Since Tor version 0.2.4.12-alpha the directory authorities use
+  the Bandwidth List file called "V3BandwidthsFile" generated by
+  Torflow [1]. The format is described in Torflow's README.spec.txt and
+  is considered to be version 1.0.0.
+
+    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
+    NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
+    "OPTIONAL" in this document are to be interpreted as described in
+    RFC 2119.
+
+1.2. Acknowledgements
+
+  The original bandwidth generator (Torflow) and format was
+  created by mike. Teor suggested to write this specification while
+  contributing on pastly's new bandwidth generator implementation.
+
+  This specification was revised after feedback from:
+
+    Nick Mathewson (nickm)
+    Iain Learmonth (irl)
+
+1.3 Outline
+
+  The Tor directory protocol (dir-spec.txt [3]) sections 3.4.1
+  and 3.4.2, use the term bandwidth measurements, to refer to what
+  here is called Bandwidth List.
+  A Bandwidth List file contains information on relays' bandwidth
+  capacities and is produced by bandwidth generators, previously known
+  as bandwidth scanners.
+
+1.4. Format Versions
+
+   1.0.0 - The legacy fallback Bandwidth List format
+
+   1.1.0 - Adds KeyValue Lines to the Header List section, add KeyValues
+           to RelayLines and format versions.
+
+  All Tor versions can consume format version 1.0.0.
+  All Tor versions can consume format version 1.1.0,
+  but they warn on additional header Lines.
+  [TODO: this might be fixed, and if it is fixed should be said which
+  version of Tor]
+
+2. Format details
+
+  The Bandwidth List MUST contain the following sections:
+  - Header List (exactly once)
+  - Relays' Bandwidth List (zero or more times)
+  If it does not contain these sections, parsers SHOULD ignore the file.
+
+2.1. Definitions
+
+  The following nonterminals are defined in Tor directory protocol
+  sections 1.2., 2.1.1., 2.1.3.:
+
+    Int
+    SP (space)
+    NL (newline)
+    Keyword
+    ArgumentChar
+    nickname
+    hexdigest (a '$', followed by 40 hexadecimal characters
+      ([A-Fa-f0-9]))
+
+  Nonterminal defined section 2 of version-spec.txt [4]:
+
+    version_number
+
+  We define the following nonterminals:
+
+    Line ::= ArgumentChar* NL
+    RelayLine ::= KeyValue (SP KeyValue)* NL
+    KeyValue ::= Keyword "=" Value
+    Value ::= ArgumentCharValue+
+    ArgumentCharValue ::= any printing ASCII character except NL and SP.
+    Terminator ::= "====="
+    Timestamp ::= Int
+    Bandwidth ::= Int
+    MasterKey ::= a base64-encoded Ed25519 public key, with
+    padding characters omitted.
+    DateTime ::= "YYYY-MM-DDTHH:MM:SS", as in ISO 8601
+
+  Note that key_value and value are defined in Tor directory protocol
+  with different formats to KeyValue and Value here.
+
+  All Lines in the file MUST be 510 characters or less, to allow for the
+  trailing newline and NULL characters.
+  The previous limit was 254 characters in Tor 0.2.6.2-alpha and
+  earlier.
+  The parser MAY ignore longer Lines.
+  [TODO: Change this restriction in 1.1.0 or later]
+
+2.2. Header List format
+
+Some header Lines MUST appear in specific positions, as documented
+below.
+All other Lines can appear in any order.
+If a parser does not recognize any extra material in a header Line,
+the Line MUST be ignored.
+If a header Line does not conform to this format, the Line SHOULD be
+ignored by parsers.
+
+It consists of:
+
+  Timestamp NL
+
+    [At start, exactly once.]
+
+    The Unix Epoch time in seconds when the file was created.
+    It does not follow the KeyValue format for backwards
+    compatibility with version 1.0.0.
+
+  "version=" version_number NL
+
+    [In second position, zero or one time.]
+
+    The specification document format version.
+    It uses semantic versioning [5].
+
+    This Line has been added in version 1.1.0 of this specification.
+
+    Version 1.0.0 documents do not contain this Line, and the
+    version_number is considered to be "1.0.0".
+
+  "software=" Value NL
+
+    [Zero or one time.]
+
+    The name of the software that created the document.
+
+    This Line has been added in version 1.1.0 of this specification.
+
+    Version 1.0.0 documents do not contain this Line, and the software
+    is considered to be "torflow".
+
+  "software_version=" Value NL
+
+    [Zero or one time.]
+
+    The version of the software that created the document.
+    The version may be a version_number, a git commit, or some other
+    version scheme.
+
+    This Line has been added in version 1.1.0 of this specification.
+
+  "generator_started=" DateTime NL
+
+    [Zero or one time.]
+
+    The date and time timestamp in ISO 8601 format and UTC time zone
+    when the generator started.
+
+    This Line has been added in version 1.1.0 of this specification.
+
+  "earliest_bandwidth=" DateTime NL
+
+    [Zero or one time.]
+
+    The date and time timestamp in ISO 8601 format and UTC time zone
+    when the first relay bandwidth was obtained.
+
+    This Line has been added in version 1.1.0 of this specification.
+
+  KeyValue NL
+
+    [Zero or more times.]
+
+    There MUST NOT be multiple KeyValue header Lines with the same key.
+    If there are, the parser SHOULD choose an arbitrary Line.
+
+    If a parser does not recognize a Keyword in a KeyValue Line, it
+    MUST be ignored.
+
+    Future format versions may include additional KeyValue header Lines.
+    Additional header Lines will be accompanied by a minor version
+    increment.
+
+    Implementations MAY add additional header Lines as needed. This
+    specification SHOULD be updated to avoid conflicting meanings for
+    the same header keys.
+
+    Parsers MUST NOT rely on the order of these additional Lines.
+
+    Additional header Lines MUST NOT use any keywords specified in the
+    relay measurements format.
+    If there are, the parser MAY ignore conflicting keywords.
+
+  Terminator NL
+
+    [Zero or one time.]
+
+    The Header List section ends with this Terminator.
+
+    In version 1.0.0, Header List ends when the first relay bandwidth
+    is found conforming to the next section.
+    Implementations of version 1.1.0 SHOULD include this Line.
+
+2.3. Relays' Bandwidth List format
+
+It consists of zero or more RelayLines with the relays' bandwidth
+in arbitrary order.
+
+There MUST NOT be multiple KeyValue pairs with the same key in the same
+RelayLine.
+If there are, the parser SHOULD choose an arbitrary Value.
+
+There MUST NOT be multiple RelayLine per relay identity (node_id or
+master_key_ed25519).
+If there are, parsers SHOULD issue a warning and MAY choose an arbitrary
+value or ignore both values.
+
+If a parser does not recognize any extra material in a RelayLine,
+the extra material MUST be ignored.
+
+Each RelayLine MUST include the following KeyValue pairs:
+In version 1.0.0, node_id MUST NOT be at the end of the Line.
+In version 1.1.0, the KeyValue can be in any arbitrary order.
+[TODO: list of Tor version that support it, when it's done]
+
+  "node_id=" hexdigest
+
+    [Exactly once.]
+
+    The fingerprint for the relay's RSA identity key.
+
+  "master_key_ed25519=" MasterKey
+
+    [Zero or one time.]
+
+    The relays's master Ed25519 key, base64 encoded,
+    without trailing "="s, to avoid ambiguity with KeyValue "="
+    character.
+
+    Implementations of version 1.1.0 SHOULD include both node_id and
+    master_key_ed25519.
+    Parsers SHOULD accept Lines that contain at least one of them.
+
+  "bw=" Bandwidth
+
+    [Exactly once.]
+
+    The measured bandwidth of this relay.
+
+    Tor accepts zero bandwidths, but they trigger bugs in older Tor
+    implementations. Therefore, implementations SHOULD NOT produce zero
+    bandwidths. Instead, they SHOULD use one as their minimum bandwidth.
+    If there are zero bandwidths, the parser MAY ignore them.
+
+    Multiple measurements can be aggregated using an averaging scheme,
+    such as a mean, median, or decaying average.
+
+    Torflow scales bandwidths to kilobytes per second. Other
+    implementations SHOULD use kilobytes per second for their initial
+    bandwidth scaling.
+
+    If different implementations or configurations are used in votes for
+    the same network, their measurements MAY need further scaling. See
+    Appendix B for information about scaling, and one possible scaling
+    method.
+
+  KeyValue
+
+    [Zero or more times.]
+
+    Future format versions may include additional KeyValue pairs on a
+    RelayLine.
+    Additional KeyValue pairs will be accompanied by a minor version
+    increment.
+
+    Implementations MAY add additional relay KeyValue pairs as needed.
+    This specification SHOULD be updated to avoid conflicting meanings
+    for the same Keywords.
+
+    Parsers MUST NOT rely on the order of these additional KeyValue
+    pairs.
+
+    Additional KeyValue pairs MUST NOT use any keywords specified in the
+    header format.
+    If there are, the parser MAY ignore conflicting keywords.
+
+2.4. Implementation notes
+
+KeyValue pairs in RelayLines that current implementations generate.
+
+2.4.1. Simple Bandwidth Scanner
+
+Every RelayLine in sbws version 0.1.0 consists of:
+
+  "node_id=" hexdigest SP
+
+    As above.
+
+  "bw=" Bandwidth SP
+
+    As above.
+
+  "nick=" nickname SP
+
+    [Exactly once.]
+
+    The relay nickname.
+
+  "rtt=" Int SP
+
+    [Exactly once.]
+
+    The Round Trip Time in milliseconds to obtain 1 byte of data.
+
+  "time=" DateTime NL
+
+    [Exactly once.]
+
+    The date and time timestamp in ISO 8601 format and UTC time zone
+    when the last bandwidth was obtained.
+
+2.4.2. Torflow
+
+Torflow RelayLines include node_id and bw, and other KeyValue pairs [2].
+
+References:
+
+1. https://gitweb.torproject.org/torflow.git
+2. https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt#n332
+3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt
+4. https://gitweb.torproject.org/torspec.git/tree/version-spec.txt
+5. https://semver.org/
+
+A. Sample data
+
+The following has not been obtained from any real measurement.
+
+A.1. Generated by Torflow
+
+This an example version 1.0.0 document:
+
+1523911758
+node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719 pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577 circ_fail=0.2 scanner=/filepath
+node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2 measured_at=1523911623 updated_at=1523911623 pid_error=3.96703337994 pid_error_sum=3.96703337994 pid_bw=47422125 pid_delta=2.65469736988 circ_fail=0.0 scanner=/filepath
+
+A.2. Generated by sbws version 0.1.X
+[TODO: this needs to be implemented when this spec is finished]
+
+1523911758
+version=1.1.0
+software=sbws
+software_version=0.1.0
+generator_started=2018-05-08T16:13:25
+earliest_bandwidth=2018-05-08T16:13:26
+====
+node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 master_key_ed25519=YaqV4vbvPYKucElk297eVdNArDz9HtIwUoIeo0+cVIpQ bw=760 nick=Test rtt=380 time=2018-05-08T16:13:26
+node_id=$96C15995F30895689291F455587BD94CA427B6FC master_key_ed25519=a6a+dZadrQBtfSbmQkP7j2ardCmLnm5NJ4ZzkvDxbo0I bw=189 nick=Test2 rtt=378 time=2018-05-08T16:13:36
+
+B. Scaling bandwidths
+
+B.1. Scaling requirements
+
+Tor accepts zero bandwidths, but they trigger bugs in older Tor
+implementations. Therefore, scaling methods SHOULD perform the
+following checks:
+ * If the total bandwidth is zero, all relays should be given equal
+   bandwidths.
+ * If the scaled bandwidth is zero, it should be rounded up to one.
+
+Initial experiments indicate that scaling may not be needed for
+torflow and sbws, because their measured bandwidths are similar
+enough already.
+
+B.2. A linear scaling method
+
+If scaling is required, here is a simple linear bandwith scaling
+method, which ensures that all bandwidth votes contain approximately
+the same total bandwidth:
+
+1. Calculate the relay quota by dividing the total measured bandwidth
+   in all votes, by the number of relays with measured bandwidth
+   votes. In the public tor network, this is approximately 7500 as of
+   April 2018. The quota should be a consensus parameter, so it can be
+   adjusted for all generators on the network.
+
+2. Calculate a vote quota by multiplying the relay quota by the number
+   of relays this bandwidth authority has measured
+   bandwidths for.
+
+3. Calculate a scaling factor by dividing the vote quota by the
+   total unscaled measured bandwidth in this bandwidth
+   authority's upcoming vote.
+
+4. Multiply each unscaled measured bandwidth by the scaling
+   factor.
+
+Now, the total scaled bandwidth in the upcoming vote is
+approximately equal to the quota.
+
+B.3. Quota changes
+
+If all generators are using scaling, the quota can be gradually
+reduced or increased as needed. Smaller quotas decrease the size
+of uncompressed consensuses, and may decrease the size of
+consensus diffs and compressed consensuses. But if the relay
+quota is too small, some relays may be over- or under-weighted.



More information about the tor-commits mailing list