[tor-commits] [torspec/master] Bandwidth-measurement file specification, as sent to tor-dev
nickm at torproject.org
nickm at torproject.org
Mon May 14 18:31:26 UTC 2018
commit 9465f9c0713b2185643a976ef14e0959d3885e80
Author: juga0 <juga at riseup.net>
Date: Mon May 14 14:31:05 2018 -0400
Bandwidth-measurement file specification, as sent to tor-dev
---
bandwidth-file-spec.txt | 412 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 412 insertions(+)
diff --git a/bandwidth-file-spec.txt b/bandwidth-file-spec.txt
new file mode 100644
index 0000000..0755329
--- /dev/null
+++ b/bandwidth-file-spec.txt
@@ -0,0 +1,412 @@
+ Tor Bandwidth List Format
+ juga
+ teor
+
+1. Scope and preliminaries
+
+ This document describes the format of Tor's Bandwidth List,
+ version 1.0.0, 1.1.0 and later.
+ It is new specification for the existing format 1.0.0.
+ Describes a new format 1.1.0, which is backwards compatible with
+ 1.0.0 parsers.
+
+ Since Tor version 0.2.4.12-alpha the directory authorities use
+ the Bandwidth List file called "V3BandwidthsFile" generated by
+ Torflow [1]. The format is described in Torflow's README.spec.txt and
+ is considered to be version 1.0.0.
+
+ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
+ NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
+ "OPTIONAL" in this document are to be interpreted as described in
+ RFC 2119.
+
+1.2. Acknowledgements
+
+ The original bandwidth generator (Torflow) and format was
+ created by mike. Teor suggested to write this specification while
+ contributing on pastly's new bandwidth generator implementation.
+
+ This specification was revised after feedback from:
+
+ Nick Mathewson (nickm)
+ Iain Learmonth (irl)
+
+1.3 Outline
+
+ The Tor directory protocol (dir-spec.txt [3]) sections 3.4.1
+ and 3.4.2, use the term bandwidth measurements, to refer to what
+ here is called Bandwidth List.
+ A Bandwidth List file contains information on relays' bandwidth
+ capacities and is produced by bandwidth generators, previously known
+ as bandwidth scanners.
+
+1.4. Format Versions
+
+ 1.0.0 - The legacy fallback Bandwidth List format
+
+ 1.1.0 - Adds KeyValue Lines to the Header List section, add KeyValues
+ to RelayLines and format versions.
+
+ All Tor versions can consume format version 1.0.0.
+ All Tor versions can consume format version 1.1.0,
+ but they warn on additional header Lines.
+ [TODO: this might be fixed, and if it is fixed should be said which
+ version of Tor]
+
+2. Format details
+
+ The Bandwidth List MUST contain the following sections:
+ - Header List (exactly once)
+ - Relays' Bandwidth List (zero or more times)
+ If it does not contain these sections, parsers SHOULD ignore the file.
+
+2.1. Definitions
+
+ The following nonterminals are defined in Tor directory protocol
+ sections 1.2., 2.1.1., 2.1.3.:
+
+ Int
+ SP (space)
+ NL (newline)
+ Keyword
+ ArgumentChar
+ nickname
+ hexdigest (a '$', followed by 40 hexadecimal characters
+ ([A-Fa-f0-9]))
+
+ Nonterminal defined section 2 of version-spec.txt [4]:
+
+ version_number
+
+ We define the following nonterminals:
+
+ Line ::= ArgumentChar* NL
+ RelayLine ::= KeyValue (SP KeyValue)* NL
+ KeyValue ::= Keyword "=" Value
+ Value ::= ArgumentCharValue+
+ ArgumentCharValue ::= any printing ASCII character except NL and SP.
+ Terminator ::= "====="
+ Timestamp ::= Int
+ Bandwidth ::= Int
+ MasterKey ::= a base64-encoded Ed25519 public key, with
+ padding characters omitted.
+ DateTime ::= "YYYY-MM-DDTHH:MM:SS", as in ISO 8601
+
+ Note that key_value and value are defined in Tor directory protocol
+ with different formats to KeyValue and Value here.
+
+ All Lines in the file MUST be 510 characters or less, to allow for the
+ trailing newline and NULL characters.
+ The previous limit was 254 characters in Tor 0.2.6.2-alpha and
+ earlier.
+ The parser MAY ignore longer Lines.
+ [TODO: Change this restriction in 1.1.0 or later]
+
+2.2. Header List format
+
+Some header Lines MUST appear in specific positions, as documented
+below.
+All other Lines can appear in any order.
+If a parser does not recognize any extra material in a header Line,
+the Line MUST be ignored.
+If a header Line does not conform to this format, the Line SHOULD be
+ignored by parsers.
+
+It consists of:
+
+ Timestamp NL
+
+ [At start, exactly once.]
+
+ The Unix Epoch time in seconds when the file was created.
+ It does not follow the KeyValue format for backwards
+ compatibility with version 1.0.0.
+
+ "version=" version_number NL
+
+ [In second position, zero or one time.]
+
+ The specification document format version.
+ It uses semantic versioning [5].
+
+ This Line has been added in version 1.1.0 of this specification.
+
+ Version 1.0.0 documents do not contain this Line, and the
+ version_number is considered to be "1.0.0".
+
+ "software=" Value NL
+
+ [Zero or one time.]
+
+ The name of the software that created the document.
+
+ This Line has been added in version 1.1.0 of this specification.
+
+ Version 1.0.0 documents do not contain this Line, and the software
+ is considered to be "torflow".
+
+ "software_version=" Value NL
+
+ [Zero or one time.]
+
+ The version of the software that created the document.
+ The version may be a version_number, a git commit, or some other
+ version scheme.
+
+ This Line has been added in version 1.1.0 of this specification.
+
+ "generator_started=" DateTime NL
+
+ [Zero or one time.]
+
+ The date and time timestamp in ISO 8601 format and UTC time zone
+ when the generator started.
+
+ This Line has been added in version 1.1.0 of this specification.
+
+ "earliest_bandwidth=" DateTime NL
+
+ [Zero or one time.]
+
+ The date and time timestamp in ISO 8601 format and UTC time zone
+ when the first relay bandwidth was obtained.
+
+ This Line has been added in version 1.1.0 of this specification.
+
+ KeyValue NL
+
+ [Zero or more times.]
+
+ There MUST NOT be multiple KeyValue header Lines with the same key.
+ If there are, the parser SHOULD choose an arbitrary Line.
+
+ If a parser does not recognize a Keyword in a KeyValue Line, it
+ MUST be ignored.
+
+ Future format versions may include additional KeyValue header Lines.
+ Additional header Lines will be accompanied by a minor version
+ increment.
+
+ Implementations MAY add additional header Lines as needed. This
+ specification SHOULD be updated to avoid conflicting meanings for
+ the same header keys.
+
+ Parsers MUST NOT rely on the order of these additional Lines.
+
+ Additional header Lines MUST NOT use any keywords specified in the
+ relay measurements format.
+ If there are, the parser MAY ignore conflicting keywords.
+
+ Terminator NL
+
+ [Zero or one time.]
+
+ The Header List section ends with this Terminator.
+
+ In version 1.0.0, Header List ends when the first relay bandwidth
+ is found conforming to the next section.
+ Implementations of version 1.1.0 SHOULD include this Line.
+
+2.3. Relays' Bandwidth List format
+
+It consists of zero or more RelayLines with the relays' bandwidth
+in arbitrary order.
+
+There MUST NOT be multiple KeyValue pairs with the same key in the same
+RelayLine.
+If there are, the parser SHOULD choose an arbitrary Value.
+
+There MUST NOT be multiple RelayLine per relay identity (node_id or
+master_key_ed25519).
+If there are, parsers SHOULD issue a warning and MAY choose an arbitrary
+value or ignore both values.
+
+If a parser does not recognize any extra material in a RelayLine,
+the extra material MUST be ignored.
+
+Each RelayLine MUST include the following KeyValue pairs:
+In version 1.0.0, node_id MUST NOT be at the end of the Line.
+In version 1.1.0, the KeyValue can be in any arbitrary order.
+[TODO: list of Tor version that support it, when it's done]
+
+ "node_id=" hexdigest
+
+ [Exactly once.]
+
+ The fingerprint for the relay's RSA identity key.
+
+ "master_key_ed25519=" MasterKey
+
+ [Zero or one time.]
+
+ The relays's master Ed25519 key, base64 encoded,
+ without trailing "="s, to avoid ambiguity with KeyValue "="
+ character.
+
+ Implementations of version 1.1.0 SHOULD include both node_id and
+ master_key_ed25519.
+ Parsers SHOULD accept Lines that contain at least one of them.
+
+ "bw=" Bandwidth
+
+ [Exactly once.]
+
+ The measured bandwidth of this relay.
+
+ Tor accepts zero bandwidths, but they trigger bugs in older Tor
+ implementations. Therefore, implementations SHOULD NOT produce zero
+ bandwidths. Instead, they SHOULD use one as their minimum bandwidth.
+ If there are zero bandwidths, the parser MAY ignore them.
+
+ Multiple measurements can be aggregated using an averaging scheme,
+ such as a mean, median, or decaying average.
+
+ Torflow scales bandwidths to kilobytes per second. Other
+ implementations SHOULD use kilobytes per second for their initial
+ bandwidth scaling.
+
+ If different implementations or configurations are used in votes for
+ the same network, their measurements MAY need further scaling. See
+ Appendix B for information about scaling, and one possible scaling
+ method.
+
+ KeyValue
+
+ [Zero or more times.]
+
+ Future format versions may include additional KeyValue pairs on a
+ RelayLine.
+ Additional KeyValue pairs will be accompanied by a minor version
+ increment.
+
+ Implementations MAY add additional relay KeyValue pairs as needed.
+ This specification SHOULD be updated to avoid conflicting meanings
+ for the same Keywords.
+
+ Parsers MUST NOT rely on the order of these additional KeyValue
+ pairs.
+
+ Additional KeyValue pairs MUST NOT use any keywords specified in the
+ header format.
+ If there are, the parser MAY ignore conflicting keywords.
+
+2.4. Implementation notes
+
+KeyValue pairs in RelayLines that current implementations generate.
+
+2.4.1. Simple Bandwidth Scanner
+
+Every RelayLine in sbws version 0.1.0 consists of:
+
+ "node_id=" hexdigest SP
+
+ As above.
+
+ "bw=" Bandwidth SP
+
+ As above.
+
+ "nick=" nickname SP
+
+ [Exactly once.]
+
+ The relay nickname.
+
+ "rtt=" Int SP
+
+ [Exactly once.]
+
+ The Round Trip Time in milliseconds to obtain 1 byte of data.
+
+ "time=" DateTime NL
+
+ [Exactly once.]
+
+ The date and time timestamp in ISO 8601 format and UTC time zone
+ when the last bandwidth was obtained.
+
+2.4.2. Torflow
+
+Torflow RelayLines include node_id and bw, and other KeyValue pairs [2].
+
+References:
+
+1. https://gitweb.torproject.org/torflow.git
+2. https://gitweb.torproject.org/torflow.git/tree/NetworkScanners/BwAuthority/README.spec.txt#n332
+3. https://gitweb.torproject.org/torspec.git/tree/dir-spec.txt
+4. https://gitweb.torproject.org/torspec.git/tree/version-spec.txt
+5. https://semver.org/
+
+A. Sample data
+
+The following has not been obtained from any real measurement.
+
+A.1. Generated by Torflow
+
+This an example version 1.0.0 document:
+
+1523911758
+node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 bw=760 nick=Test measured_at=1523911725 updated_at=1523911725 pid_error=4.11374090719 pid_error_sum=4.11374090719 pid_bw=57136645 pid_delta=2.12168374577 circ_fail=0.2 scanner=/filepath
+node_id=$96C15995F30895689291F455587BD94CA427B6FC bw=189 nick=Test2 measured_at=1523911623 updated_at=1523911623 pid_error=3.96703337994 pid_error_sum=3.96703337994 pid_bw=47422125 pid_delta=2.65469736988 circ_fail=0.0 scanner=/filepath
+
+A.2. Generated by sbws version 0.1.X
+[TODO: this needs to be implemented when this spec is finished]
+
+1523911758
+version=1.1.0
+software=sbws
+software_version=0.1.0
+generator_started=2018-05-08T16:13:25
+earliest_bandwidth=2018-05-08T16:13:26
+====
+node_id=$68A483E05A2ABDCA6DA5A3EF8DB5177638A27F80 master_key_ed25519=YaqV4vbvPYKucElk297eVdNArDz9HtIwUoIeo0+cVIpQ bw=760 nick=Test rtt=380 time=2018-05-08T16:13:26
+node_id=$96C15995F30895689291F455587BD94CA427B6FC master_key_ed25519=a6a+dZadrQBtfSbmQkP7j2ardCmLnm5NJ4ZzkvDxbo0I bw=189 nick=Test2 rtt=378 time=2018-05-08T16:13:36
+
+B. Scaling bandwidths
+
+B.1. Scaling requirements
+
+Tor accepts zero bandwidths, but they trigger bugs in older Tor
+implementations. Therefore, scaling methods SHOULD perform the
+following checks:
+ * If the total bandwidth is zero, all relays should be given equal
+ bandwidths.
+ * If the scaled bandwidth is zero, it should be rounded up to one.
+
+Initial experiments indicate that scaling may not be needed for
+torflow and sbws, because their measured bandwidths are similar
+enough already.
+
+B.2. A linear scaling method
+
+If scaling is required, here is a simple linear bandwith scaling
+method, which ensures that all bandwidth votes contain approximately
+the same total bandwidth:
+
+1. Calculate the relay quota by dividing the total measured bandwidth
+ in all votes, by the number of relays with measured bandwidth
+ votes. In the public tor network, this is approximately 7500 as of
+ April 2018. The quota should be a consensus parameter, so it can be
+ adjusted for all generators on the network.
+
+2. Calculate a vote quota by multiplying the relay quota by the number
+ of relays this bandwidth authority has measured
+ bandwidths for.
+
+3. Calculate a scaling factor by dividing the vote quota by the
+ total unscaled measured bandwidth in this bandwidth
+ authority's upcoming vote.
+
+4. Multiply each unscaled measured bandwidth by the scaling
+ factor.
+
+Now, the total scaled bandwidth in the upcoming vote is
+approximately equal to the quota.
+
+B.3. Quota changes
+
+If all generators are using scaling, the quota can be gradually
+reduced or increased as needed. Smaller quotas decrease the size
+of uncompressed consensuses, and may decrease the size of
+consensus diffs and compressed consensuses. But if the relay
+quota is too small, some relays may be over- or under-weighted.
More information about the tor-commits
mailing list