[tor-dev] two new proposals for bridgedb
isis agora lovecruft
isis at torproject.org
Sat Nov 2 10:54:27 UTC 2013
I separated out the backend database improvements to BridgeDB from the social
bridge distributor proposal. The former is finished and up for review and is
being submitted to potential funders; the latter is still a draft (though now
that some parts of the underlying crypto scheme and the threat model are
detailed, early review is always helpful if anyone is super eager to point
fingers at any mistakes I've made).
Copies of both proposals are attached, and are also available in the common
BridgeDB repo at
https://gitweb.torproject.org/bridgedb.git/blob/refs/heads/feature/7520-social-dist-design:/doc/proposals/XXX-bridgedb-database-improvements.txt
and
https://gitweb.torproject.org/bridgedb.git/blob/refs/heads/feature/7520-social-dist-design:/doc/proposals/XXX-bridgedb-social-distribution.txt
--
♥Ⓐ isis agora lovecruft
_________________________________________________________
GPG: 4096R/A3ADB67A2CDB8B35
Current Keys: https://blog.patternsinthevoid.net/isis.txt
-------------- next part --------------
# -*- coding: utf-8 ; mode: org -*-
Filename: XXX-social-bridge-distribution.txt
Title: Social Bridge Distribution
Author: Isis Agora Lovecruft
Created: 18 July 2013
Related Proposals: 199-bridgefinder-integration.txt
XXX-bridgedb-database-improvements.txt
Status: Draft
* I. Overview
This proposal specifies a system for social distribution of the
centrally-stored bridges within BridgeDB. It is primarily based upon Part
IV of the rBridge paper, [1] utilising a coin-based incentivisation scheme
to ensure that malicious users and/or censoring entities are deterred from
blocking bridges, as well as socially-distributed invite tickets to prevent
such malicious users and/or censoring entities from joining the pool of
Tor clients who are able to receive distributed bridges.
* II. Motivation & Problem Scope
As it currently stands, Tor bridges which are stored within BridgeDB may be
freely requested by any entity at nearly any time. While the openness, that
is to say public accessibility, of any anonymity system certainly
provisions its users with the protections of a more greatly diversified
anonymity set, the damages to usability, and the efficacy of such an
anonymity system for censorship circumvention, are devastatingly impacted
due to the equal capabilities of both a censoring/malicious entity and an
honest user to request new Tor bridges.
Thus far, very little has been done to protect the volunteered bridges from
eventually being blocked in various regions. This severely restricts the
overall usability of Tor for clients within these regions, who, arguably,
can be even more in need of the identity protections and free speech
enablement which Tor can provide, given their political contexts.
** II.A. Current Tor bridge distribution mechanisms and known pitfalls:
*** 1. HTTP(S) Distributor
At https://bridges.torproject.org, users may request new bridges, provided
that they are able to pass a CAPTCHA test. Requests through the HTTP(S)
Distributor are not allowed to be made from any current Tor exit relay,
and a hash of the user's actual IP address is used to place them within a
hash ring so that only a subset of the bridges allotted to the HTTP(S)
Distributor's pool may become known to a(n) adversary/user at that IP
address.
**** 1.a. Known attacks/pitfalls:
1) An adversary with a diverse and large IP address space can easily
retrieve some significant portion of the bridges in the HTTPS
Distributor pool.
2) The relatively low cost of employing humans to solve CAPTCHAs is not
sufficient to deter adversaries with requisite economic resources from
doing so to obtain bridges. [XXX cost of employment]
*** 2. Email Distributor
Clients may send email to bridges at bridges.torproject.org with the line
"get bridges" in the body of the email to obtain new bridges. Such emails
must be sent from a Gmail or Yahoo! account, which is required under the
assumption that such accounts are non-trivial to obtain.
**** 2.a. Known attacks/pitfalls:
1) Mechanisms for purchasing pre-registered Gmail accounts en masse
exists, charging between USD$0.25 and USD$0.70 per account. With
roughly 1000 bridges in the Email Distributor's pool, distributing 3
bridges per email response,
* III. Terminology & Notations
** III.A. Terminology Definitions
User := A client connecting to BridgeDB in order to obtain bridges.
** III.B. Notations
|--------------------+---------------------------------------------------------------------------------------------|
| Symbol | Definition |
|--------------------+---------------------------------------------------------------------------------------------|
| U | A user connecting to BridgeDB in order to obtain bridges, identified by a User Credential |
| D | The bridge distributor, i.e. BridgeDB |
| Gᵐᵃˣ | Upper limit (maximum) number of bridge users for a bridge Bᵢ |
| Gˢᵗᵃʳᵗ | Number of starting users |
| Gᵃᵛᵍ | Average number of users per bridge |
| M | Fraction of users which are malicious |
| B | A bridge |
| {B₁, …, Bᵢ, …, Bₖ} | The set of bridges assigned and given to user U |
| k | The number of bridges which have been given to user U |
| Tᵐⁱⁿ | The minimum time which a bridge must remain reachable |
| Tᶜᵘʳ | The current time, given in Unix Era (UE) seconds notation (an integer, seconds since epoch) |
| Tᵐᵃˣ | The upper bound on the time for which a user U can earn coins from Bᵢ |
| τᵢ | The time when bridge Bᵢ was first given to user U |
| tᵢ | The time from when U was first given Bᵢ to either Tᶜᵘʳ or ßᵢ, whichever is greater |
| ßᵢ | The time when bridge Bᵢ was first considered blocked; if not blocked, ßᵢ = 0 |
| ϕ | Total coins owned by user U |
| φᵢ | The coins which user U has earned thus far from bridge Bᵢ |
| ϱᵢ | Rate of earning coins from bridge Bᵢ |
| λᵢ | The probability that bridge Bᵢ has been blocked |
| ω | The last time that U requested and Invite Ticket from D |
|--------------------+---------------------------------------------------------------------------------------------|
* IV. Threat Model
In the original rBridge scheme, there are two separate proposals: the first
does not make any attempt to hide information such as the user's (U)
identity, the list of bridges given to U, the from BridgeDBBridgeDB is
In our modifications to the rBridge social bridge distribution scheme,
BridgeDB is considered a trusted party, that is to say, BridgeDB is
assumed to be honest in all protocols, and no protections are taken to
protect clients from malicious behaviour from BridgeDB.
**** Why we should still hide the Credential from BridgeDB:
Lemma 1:
A User Credential contains that User's list of Bridges, and thus, in all
probability, it uniquely identifies the User.
Proof 1:
For simplicity's sake, if we falsely assume ☥ that the Bridges in a
User's Credential is a constant and static number, then an estimate for
the number of possible Credentials is given by:
Γ(n+1)
nCₖ = ⎽⎽⎽⎽⎽⎽⎽⎽⎽⎽⎽⎽⎽⎽⎽
Γ(m+1)Γ(-m+n+1)
⎛n⎞
for the binomial coefficient ⎝m⎠, where n is the number of Bridges, m is
the number of Bridges in a User Credential, and Γ is the gamma function.
⎛5000⎞
With ⎝ 3 ⎠ there are 2.0820835 x 10¹⁰ possible Credentials, or, roughly
three unique Credentials for every one of the seven billion people alive
on Earth today. The binomial coefficient grows tetrationally for
increasing n and increasing m, [0] and so as the number of Bridge relays
grows over time, and with Users perpetually appending newer Bridges to
their Creditials, the probability of colliding Credentials decreases
tetrationally. Therefore, Credentials are taken to be unique.
Because the Credentials are uniquely identifying, care should be taken so
that two User Credentials cannot be linked by BridgeDB, as this would allow
BridgeDB to obtain a social graph of the network of Bridge Users.
Therefore, it is necessary to hide the Credential from BridgeDB; otherwise,
when requesting an Invite Ticket, the User openly sending their Credential
to BridgeDB to prove possession of the minimum number of Credits would be
linkable to the created Invite Ticket.
----------
☥ It would actually be some complicated series of binomial coefficients with
respect to the individual q-binomial coefficients with q being a
differential of the Bridge turnover w.r.t. time.
*** 1. BridgeDB is permitted to know the following information:
XXX finishme
**** Which Bridges a User is being given
o How many credits a User has
** IV.A. Modifications
The original rBridge scheme is modified to model BridgeDB as a potential
malicious actor. Protecting against this at this point in time is too
costly, both in terms of development time, as well as in network bandwidth
and computational overhead. Instead, prioritization should be placed on
eliminating BridgeDB's ability to obtain a social graph for Tor bridge
users, as this is not information it currently possesses.
The rBridge scheme utilises 1-out-of-m Oblivious Transfer (OT) to allow
BridgeDB to blind a set of m Bridges, letting U pick (and thus learn the
address of) at most one out of the m Bridges. Think of it like a stage
magician waving a fanned deck of cards face down, and asking an audience
member to "pick a card! any card!" While the authors of the original paper
choose Naor and Pinkas' 1-out-of-m OT scheme [2] for its efficiency, they
failed to specify which of Naor and Pinkas' OT schemes ― as there are four
within the referenced paper and several more described elsewhere. For the
sake of continuing the argument against their recommendations to use OT
within the social bridge distribution scheme, it is presumed that the
rBridge authors were referring to the round-optimal 1-out-of-N oblivious
transfer scheme in §4 of that paper.
During the OT process, for each Bridge in m, BridgeDB creates a Blind
Signature of the Bridge and tags each signature to its corresponding
Bridge, so that if U chooses that Bridge, she will also recieve the
signature. The signature schemes utilised is Au et al.'s k-TAA Blind
Signature scheme, [8] which requires a bilinear pairing (XXX what type?)
and is q-SDH secure in the standard model. That k-TAA scheme is chosen
because it is compatible with Zero-Knowledge Proofs-of-Knowledge (ZKPoK),
such that ZKPoK may be made for k-TAA signatures, as well as for
Commitments. Additionally, Au et al.'s k-TAA signature scheme is a
modification to that proposed by Camenisch and Stadler, i.e. it allows for
signatures on message vectors, provided that a nonce is included with the
message vector. See §VII.B for an open research question regarding k-TAA
signature schemes.
Next, U creates a Pedersens Commitment (CMT) to the total amount of Credits
owned by U, and another commitment to the last time that U requested an
Invite Ticket. For each of these commitments, U obtains from BridgeDB
another k_-TAA blind signature on the commitment. Then, U constructs her
own Credential, consisting of the Bridge's tagged blind signature, the
blind signature on each of the commitments, and a hash of the nonce that
used as the blinding factor. (The hash of the nonce is included so that
multiple users may not collude to swap portions of their Credentials by
using the same blinding factor.) The Fiat-Shamir transformation is then
used to convert the aformentioned ZKPoK scheme into a Zero-Knowledge
Non-Interactive Proof-of-Knowledge (NIPK) scheme. With this, U send to D
a Proof of their Credential, without revealing any of its contents.
Every so often, the User requests that BridgeDB update their Credential
with recently earned tokens. XXX finish describing this process
When one of U's Bridges is "blocked", U notifies BridgeDB of the "block"
and, likely, if she has enough Credits to afford it, requests a new bridge.
In the original rBridge design, BridgeDB is only to acknowledge requests
for new bridges after confirming that the Bridge is indeed blocked.
This is where the rBridge design begins to do a bit of handwaving. Either
that, or they neglected both to put sufficient effort into defining the
term "blocked", as well as enough thought into precisely how BridgeDB might
check this. Take for example a User behind a corporate firewall which
blocks undentified encrypted protocols: that User will report her Bridges
as "blocked" ― and they are, for her at least ― though for everyone else
they work just fine. BridgeDB can easily check Bridge reachability from
the location of BridgeDB's server, and possibly can check bridge
reachability from various network vantage points around the world (though
doing this without *causing* the Bridge to become blocked when checking
from censoring regions can quickly become quite complex). [9]
[#]: Au, Man Ho, Willy Susilo, and Yi Mu.
"Proof-of-knowledge of representation of committed value and its
applications." Information Security and Privacy.
Springer Berlin Heidelberg, 2010.
http://web.science.mq.edu.au/conferences/acisp2010/program/Session%2010%20-%20Public%20Key%20Encryption%20and%20Protocols/10-04-AuSM10acisp.pdf
* V. Design
** V.A. Overview
As mentioned, most of this proposal is based upon §IV of the rBridge
paper, which is the non-privacy preserving portion of the paper. [1] The
reasons for deferring implementation of §V include:
- Adding a simpler out-of-band distribution of bridges. Requiring users to
copy+paste Bridge lines into their torrc is ridiculous.
- XXX
Modifications to the original rBridge scheme:
- Remove Oblivious Transfer, keep blind signatures and Pedersen's Commitments.
rBridge uses 1-out-of-m Oblivious Transfer (OT) in order to allow each
client to choose their own Bridges. Simply put, if a User is to be given
three Bridges, then 1-out-of-m OT is run three times: for each time, the
following steps are taken:
1. User picks a set of m nonces and uses them to generate point in the
group G__1 via:
R
yⱼ̍ ⟵―― ℤ*ₚ, where 1 ≤ j ≤ m
2. User computes a Non-Interactive Proof-of-Knowledge (NIPK) of the set
of nonces in the following manner:
⎧ ⎛ ₘ ⎞ ₘ ⎡ yⱼ̍⎤ ⎫
ᴨ₀ = NIPK ⎨ ⎜{yⱼ̍}ⱼ₌₁⎟: ∀ⱼ₌₁⎢ Yⱼ̍ = ɡ₁ ⎥ ⎬
⎩ ⎝ ⎠ ⎣ ⎦ ⎭
⎛ ₘ ⎞
and sends ⎜{Yⱼ̍}ⱼ₌₁ ⃦ ᴨ₀⎟ to BridgeDB.
⎝ ⎠
3. BridgeDB verifies the NIPK of the set of nonces, ᴨ₀, and then created
a one-time keypair:
R ₛₖ⁰
sk⁰ ⟵―― ℤ*ₚ, pk⁰ = h
For each available bridge Bⱼ, BridgeDB randomly selects
R
eⱼ̊,yⱼ̎ ⟵―― ℤ*ₚ,
computes 1
―――――――――
⎛ yⱼ̎ Bⱼ ⎞ eⱼ̊ + ₛₖ⁰
Aⱼ̊ = ⎜ g₀g₁ Yⱼ̍g₃ ⎟
⎝ ⎠
and tags (Aⱼ̊,eⱼ̊,yⱼ̎) to Bⱼ.
4. After OT… ZKNIPK… XXX
Specifically, the 1-out-of-m OT scheme used within the "Part V: rBridge
with Privacy Preservation" section of the paper is described in
"Efficient oblivious transfer protocols" by M. Naor and B. Pinkas. [2] It
requires the use of a bilinear group pairing on a Type-3 supersingular
elliptic curve.
Unfortunately, there are very few FLOSS libraries which currently exist
for pairing-based cryptography. The one used in the benchmarking section
of the rBridge paper is libpbc [3] from Stanford University. Several
cryptographers have offhandedly remarked to me that I should not use this
library in any deployed system. When I mentioned the need for a vetted
pairing-based cryptographic library to Dr. Tanja Lange, she replied that
she has a graduate student working on it -- though when this new library
will be complete is uncertain.
libpbc has Python bindings, although pypbc [4] is quite incomplete and
only in py3k. Additionally, pypbc requires dynamic library overloading of
the shared object libraried for both libpbc and libgmp (the Gnu
Multi-Precision library, [5] which allows for calculations of arbitrary
precision on floats).
Rather than waiting for Dr. Lange's student to complete the new library,
I propose spending some small amount of time (not more than a couple
weeks) creating Python2 bindings for libpbc. From my experience, the
simplest, least error-prone method for creating Python bindings to C
libraries (and with the least amount of effort/knowledge of internal
Python functions involved) is to use CFFI. [7]
- Pedersens' Commitments
- For ZKPoK
** V.C. Data Formats
*** 1. User Credential
A Credential is a signed document obtained from BridgeDB. It contains all
of the state required to verify honest client behavior, and is formatted
as a JSON object with the following format:
{ "Bridges" : [
{ "BridgeLine" : BridgeLine,
"LearnedTS" : TimeStamp,
"CreditsEarned" : INT
},
...
],
"CrenditialTS" : TimeStamp,
"TotalUnspentCredits" : INT
} NL
BridgeLine := <Bridge line from BridgeDB>
TimeStamp := INT
NumCredits := INT
The Timestamp in this case is the time which a user first learned the
existence of that bridge.
Example:
{'Bridges': [
{'BridgeLine': '1.2.3.4:6666 obfs3 adc83b19e793491b1c6ea0fd8b46cd9f32e592fc',
'CreditsEarned': 5,
'Timestamp': 1382078292.864117},
{'BridgeLine': '6.6.6.6:1234 d929c82d2ee727ccbea9c50c669a71075249899f',
'CreditsEarned': 5,
'LearnedTS': 1382078292.864117}],
'CredentialTS': 982398423,
'TotalUnspentCredits': 10}
*** XXX other formats
* VI. Open Questions
** VI.A. In which component of the Tor ecosystem should the client application code go?
*** 1. Should this be done as a Pluggable Transport?
Considerations:
**** 1a. It doesn't need to modify the user's application-level traffic
The clientside will eventually need to be able to build a circuit to the
BridgeDB backend, but it is not necessary that the clientside handle
any of the user's application level traffic. However, the clientside
system of rBridge must start when TBB (or tor) is started.
**** 1b. It needs to be able to start tor.
This is necessary because the lines:
{{{
UseBridges 1
Bridge [...]
}}}
must be present before tor is started; tor will not reload these
settings via SIGHUP.
**** 1c. TorLaucher is not the correct place for this functionality.
I am *not* adding this to TorLauncher. The clientside of rBridge will
eventually need to handle a lot of complicated new cryptographic
primitives, including commitments and zero-knowledge proofs. This is
dangerous enough, period, because there aren't really any libraries
for Pairing-Based Cryptography yet (though Tanya Lange has mentioned
to me that a student of theirs should have a good one finished some
time this year -- but I'm still going to count that as existing like
a unicorn). If I am to write this, I am doing it in
C/Python/Python-extensions. Not JS.
***** c.i It could possibly launch TorLauncher
In other words, this thing edits the torrc according to it's state,
and then either launches tor (if the user wants to use an installed
tor binary) or launches TorLauncher if we're running TBB.
**** 1d. Little-t tor is not the correct place for this either.
It might be possible, instead of (b) or (c), to add this to little-t
tor. However, I feel like the bridge distribution problem is a
separate to tor, which should be (more or less) strictly an
implementation of the onion-routing design. Additionally, I do not
wish to pile more code or maintenance upon eith Nick or Andrea, nor
do I wish to make little-t tor more monolithic.
I talked with Nick briefly about this at the Summer 2013 Tor Dev
meeting in München, and he agreed that little-t tor isn't where this
code should go.
** VI.B. Anonymous Authentication/Signature Schemes?
As the property of conditional anonymity of k-TAA blind signatures is not
utilised in any version of the social bridge distribution design, some
research should be done on other Anonymous or Partial signature schemes
which allow signatures to be made on message vectors. The k-TAA
signature scheme used in rBridge, designed by Au et al., [XXX] was based
off of one of Camenisch and Lysyanskaya's signature schemes. (Which one?)
Of particular interest, the cryptologists Camenisch and Lysyanskaya have
several schemes for various types of anonymous signatures, with varying
properties, as well as "A Formal Treatment of Onion Routing." [XXX] I am
under the impresseion that when they say "anonymous" they mean in the
strong sense (versus other cryptologists who attempt to design signature
schemes with "revocable anonymity", for example, trusted Centralised-PKI
Anonymous Proxy Signature schemes, or signature schemes with "anonymity"
that is revocable by a third party). [XXX]
Specifically, one paper, "Randomizable Proofs and Delegatable Anonymous
Credentials" by Camenisch and Lysyanskaya could be applicable to
simultaneously ensuring all of the following properties for Invite
Tickets:
* The Unlinkability of a generated Invite Ticket to one used later for
registration.
* Strong Anonymity for the holders of such Invite Tickets and for their
eventual recipients. Many "unlinkable token" schemes which rely on
blind signatures, i.e. Chaum's tokens, remain vulnerable to a
particular deanonymisation attack if the Signer is modelled as a
"curious" or malicious entity who stores records of the protocol
steps for blind signatures. [XXX explain]
* Unforgeability
* Verifiability
* VII. Dependencies Upon Other Tor Software
** VII.A. Tor Controllers
*** 1. Proposal #199: Integration of BridgeFinder and BridgeFinderHelper
The client-side code of BridgeDB will essentially be acting as a
BridgeFinder, and thus BridgeDB will require a client-side mechanism for
communication with various Tor Controllers. This is necessary in order to
present a discovery mechanism whereby a Tor Controller may learn the
current number of Credits and Invite Tickets available to a User, and may
display this information in some meaningful manner.
* References
[0]: Ayad, Hanan. "Growth Rate of the Binomial Coefficient."
Lecture Notes on SYDE423 - Computer Algorithm Design and Analysis.
University of Waterloo, Canada, 2008.
http://www.hananayad.com/teaching/syde423/binomialCoefficient.pdf
[1]: http://www-users.cs.umn.edu/~hopper/rbridge_ndss13.pdf
[2]: Naor, Moni, and Benny Pinkas. "Efficient oblivious transfer protocols."
Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms.
Society for Industrial and Applied Mathematics, 2001.
http://www.wisdom.weizmann.ac.il/%7Enaor/PAPERS/eotp.ps
https://gitweb.torproject.org/user/isis/bridgedb.git/tree/refs/heads/feature/7520-social-dist-design:/doc/papers/naor2001efficient.pdf
[3]: https://crypto.stanford.edu/pbc/
http://repo.or.cz/r/pbc.git
[4]: https://www.gitorious.org/pypbc/pages/Documentation
git at gitorious.org:pypbc/pypbc.git
[5]: http://gmplib.org/
[6]: https://metrics.torproject.org/formats.html#descriptortypes
[7]: https://bitbucket.org/cffi/cffi
[8]: Au, Man Ho, Willy Susilo, and Yi Mu. "Constant-size dynamic k-TAA."
Security and Cryptography for Networks.
Springer Berlin Heidelberg, 2006. 111-125.
http://ro.uow.edu.au/cgi/viewcontent.cgi?article=10257&context=infopapers
[19]: https://trac.torproject.org/projects/tor/ticket/6396#comment:16
-------------- next part --------------
# -*- coding: utf-8 ; mode: org -*-
Filename: XXX-bridgedb-database-improvements.txt
Title: "Scalability and Stability Improvements to BridgeDB: Switching to a
Distributed Database System and RDBMS"
Author: Isis Agora Lovecruft
Created: 12 Oct 2013
Related Proposals: XXX-social-bridge-distribution.txt
Status: Open
* I. Overview
BridgeDB is Tor's Bridge Distribution system, which currently has two major
Bridge Distribution mechanisms: the HTTPS Distributor and an Email
Distributor. [0]
BridgeDB is written largely in Twisted Python, and uses Python2's builtin
sqlite3 as its database backend. Unfortunately, this backend system is
already showing strain through increased times for queries, and sqlite's
memory usage is not up-to-par with modern, more efficient, NoSQL databases.
In order to better facilitate the implementation of newer, more complex
Bridge Distribution mechanisms, several improvements should be made to the
underlying database system of BridgeDB. Additionally, I propose that a
clear distinction in terms, as well as a modularisation of the codebase, be
drawn between the mechanisms for Bridge Distribution versus the backend
Bridge Database (BridgeDB) storage system.
This proposal covers the design and implementation of a scalable NoSQL ―
Document-Based and Key-Value Relational ― database backend for storing data
on Tor Bridge relays, in an efficient manner that is ammenable to
interfacing with the Twisted Python asynchronous networking code of current
and future Bridge Distribution mechanisms.
* II. Terminology
BridgeDistributor := A program which decides when and how to hand out
information on a Tor Bridge relay, and to whom.
BridgeDB := The backend system of databases and object-relational mapping
servers, which interfaces with the BridgeDistributor in order
to hand out bridges to clients, and to obtain and process new,
incoming ``@type bridge-server-descriptors``,
``@type bridge-networkstatus`` documents, and
``@type bridge-extrainfo`` descriptors. [3]
BridgeFinder := A client-side program for an Onion Proxy (OP) which handles
interfacing with a BridgeDistributor in order to obtain new
Bridge relays for a client. A BridgeFinder also interfaces
with a local Tor Controller (such as TorButton or ARM) to
handle automatic, transparent Bridge configuration (no more
copy+pasting into a torrc) without being given any
additional privileges over the Tor process, [1] and relies
on the Tor Controller to interface with the user for
control input and displaying up-to-date information
regarding available Bridges, Pluggable Transport methods,
and potentially Invite Tickets and Credits (a cryptographic
currency without fiat value which is generated
automatically by clients whose Bridges remain largely
uncensored, and is used to purchase new Bridges), should a
Social Bridge Distributor be implemented. [2]
* III. Databases
** III.A. Scalability Requirements
Databases SHOULD be implemented in a manner which is ammenable to using a
distributed storage system; this is necessary because many potential
datatypes required by future BridgeDistributors MUST be stored permanently.
For example, in the designs for the Social Bridge Distributor, the list of
hash digests of spent Credits, and the list of hash digests of redeemed
Invite Tickets MUST be stored forever to prevent either from being replayed
― or double-spent ― by a malicious user who wishes to block bridges faster.
Designing the BridgeDB backend system such that additional nodes may be
added in the future will allow the system to freely scale in relation to
the storage requirements of future BridgeDistributors.
Additionally, requiring that the implementation allow for distributed
database backends promotes modularisation the components of BridgeDB, such
that BridgeDistributors can be separated from the backend storage system,
BridgeDB, as all queries will be issued through a simplified, common API,
regardless of the number of nodes system, or the design of future
BridgeDistributors.
*** 1. Distributed Database System
A distributed database system SHOULD be used for BridgeDB, in order to
scale resources as the number of Tor bridge users grows. This database
system, hereafter referred to as DDBS.
The DDBS MUST be capable of working within Twisted's asynchronous
framework. If possible, a Object-Relational Mapper (ORM) SHOULD be used to
abstract the database backend's structure and query syntax from the Twisted
Python classes which interact with it, so that the type of database may be
swapped out for another with less code refactoring.
The DDBM SHALL be used for persistent storage of complex data structures
such as the bridges, which MAY include additional information from both the
`@type bridge-server-descriptor`s and the `@type bridge-extra-info`
descriptors. [3]
**** 1.a. Choice of DDBS
CouchDB is chosen for its simple HTTP API, ease of use, speed, and official
support for Twisted Python applications. [4] Additionally, its
document-based data model is very similar to the current archetecture of
tor's Directory Server/Mirror system, in that an HTTP API is used to
retrieve data stored within virtual directories. Internally, it uses JSON
to store data and JavaScript as its query language, both of which are
likely friendlier to various other components of the Tor Metrics
infrastructure which sanitise and analyse portions of the Bridge
descriptors. At the very least, friendlier than hardcoding raw SQL queries
as Python strings.
**** 1.b. Data Structures which should be stored in a DDBS:
- RedactedDB - The Database of Blocked Bridges
The RedactedDB will hold entries of bridges which have been discovered to
be unreachable from BridgeDB network vantage point, or have been reported
unreachable by clients.
- BridgeDB - The Database of Bridges
BridgeDB holds information on available Bridges, obtained via bridge
descriptors and networkstatus documents from the BridgeAuthority. Because
a Bridge may have multiple `ORPort`s and multiple
`ServerTransportListenAddress`es, attaching additional data to each of
these addresses which MAY include the following information on a blocking
event:
- Geolocational country code of the reported blocking event
- Timestamp for when the blocking event was first reported
- The method used for discovery of the block
- an the believed mechanism which is causing the block
would quickly become unwieldy, the RedactedDB and BridgeDB SHOULD be kept
separate.
- User Credentials
For the Social BridgeDistributor, these are rather complex,
increasingly-growing, concatenations (or tuples) of several datatypes,
including Non-Interactive Proofs-of-Knowledge (NIPK) of Commitments to
k-TAA Blind Signatures, and NIPK of Commitments to a User's current
number of Credits and timestamps of requests for Invite Tickets.
*** 2. Key-Value Relational Database Mapping Server
For simpler data structures which must be persistently stored, such as the
list of hashes of previously seen Invite Tickets, or the list of
previously spent Tokens, a Relational Database Mapping Server (RDBMS)
SHALL be used for optimisation of queries.
Redis and Memcached are two examples of RDBMS which are well tested and
are known to work well with Twisted. The major difference between the two
is that Memcaches are stored only within volatile memory, while Redis
additionally supports commands for transferring objects into persistent,
on-disk storage.
There are several support modules for interfacing with both Memcached and
Redis from Twisted Python, see Twisted's MemCacheProtocol class [5] [6] or
txyam [7] for Memcached, and txredis [8] or txredisapi [9] for
Redis. Additionally, numerous big name projects both use Redis as part of
their backend systems, and also provide helpful documentation on their own
experience of the process of switching over to the new systems. [17] For
non-Twisted Python Redis APIs, there is redis-py, which provides a
connection pool that could likely be interfaced with from Twisted Python
without too much difficultly. [10] [11]
**** 2.a. Data Structures which should be stored in a RDBMS
Simple, mostly-flat datatypes, and data which must be frequently indexed
should be stored in a RDBMS, such as large lists of hashes, or arbitrary
strings with assigned point-values (i.e. the "Uniform Mapping" for the
current HTTPS BridgeDistributor).
For the Social BridgeDistributor, hash digests of the following datatypes
SHOULD be stored in the RDBMS, in order to prevent double-spending and
replay attacks:
- Invite Tickets
These are anonymous, unlinkable, unforgeable, and verifiable tokens
which are occasionally handed out to well-behaved Users by the Social
BridgeDistributor to permit new Users to be invited into the system.
When they are redeemed, the Social BridgeDistributor MUST store a hash
digest of their contents to prevent replayed Invite Tickets.
- Spent Credits
These are Credits which have already been redeemed for new Bridges.
The Social BridgeDistributor MUST also store a hash digest of Spent
Credits to prevent double-spending.
*** 3. Bloom Filters and Other Database Optimisations
In order to further decrease the need for lookups in the backend
databases, Bloom Filters can used to eliminate extraneous
queries. However, this optimization would only be beneficial for string
lookups, i.e. querying for a User's Credential, and SHOULD NOT be used for
queries within any of the hash lists, i.e. the list of hashes of
previously seen Invite Tickets. [14]
**** 3.a. Bloom Filters within Redis
It might be possible to use Redis' GETBIT and SETBIT commands to store a
Bloom Filter within a Redis cache system; [15] doing so would offload the
severe memory requirements of loading the Bloom Filter into memory in
Python when inserting new entries, reducing the time complexity from some
polynomial time complexity that is proportional to the integral of the
number of bridge users over the rate of change of bridge users over time,
to a time complexity of order O(1).
**** 3.b. Expiration of Stale Data
Some types of data SHOULD be safe to expire, such as User Credentials
which have not been updated within a certain timeframe. This idea should
be further explored to assess the safety and potential drawbacks to
removing old data.
If there is data which SHOULD expire, the PEXPIREAT command provided by
Redis for the key datatype would allow the RDBMS itself to handle cleanup
of stale data automatically. [16]
**** 4. Other potential uses of the improved Bridge database system
Redis provides mechanisms for evaluations to be made on data by calling
the sha1 for a serverside Lua script. [15] While not required in the
slightest, it is a rather neat feature, as it would allow Tor's Metrics
infrastructure to offload some of the computational overhead of gathering
data on Bridge usage to BridgeDB (as well as diminish the security
implications of storing Bridge descriptors).
Also, if Twisted's IProducer and IConsumer interfaces do not provide
needed interface functionality, or it is desired that other components of
the Tor software ecosystem be capable of scheduling jobs for BridgeDB,
there are well-tested mechanisms for using Redis as a message
queue/scheduling system. [16]
* References
[0]: https://bridges.torproject.org
mailto:bridges at bridges.torproject.org
[1]: See proposals 199-bridgefinder-integration.txt at
https://gitweb.torproject.org/torspec.git/blob/HEAD:/proposals/199-bridgefinder-integration.txt
[2]: See XXX-social-bridge-distribution.txt at
https://gitweb.torproject.org/user/isis/bridgedb.git/blob/refs/heads/feature/7520-social-dist-design:/doc/proposals/XXX-bridgedb-social-distribution.txt
[3]: https://metrics.torproject.org/formats.html#descriptortypes
[4]: https://github.com/couchbase/couchbase-python-client#twisted-api
[5]: https://twistedmatrix.com/documents/current/api/twisted.protocols.memcache.MemCacheProtocol.html
[6]: http://stackoverflow.com/a/5162203
[7]: http://findingscience.com/twisted/python/memcache/2012/06/09/txyam:-yet-another-memcached-twisted-client.html
[8]: https://pypi.python.org/pypi/txredis
[9]: https://github.com/fiorix/txredisapi
[10]: https://github.com/andymccurdy/redis-py/
[11]: http://degizmo.com/2010/03/22/getting-started-redis-and-python/
[12]: http://www.dr-josiah.com/2012/03/why-we-didnt-use-bloom-filter.html
[13]: http://redis.io/topics/data-types §"Strings"
[14]: http://redis.io/commands/pexpireat
[15]: http://redis.io/commands/evalsha
[16]: http://www.restmq.com/
[17]: https://www.mediawiki.org/wiki/Redis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 1154 bytes
Desc: Digital signature
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20131102/735ce388/attachment-0001.sig>
More information about the tor-dev
mailing list