[tor-dev] BridgeDB - Bridge Distribution Modifications
Matthew Finkel
matthew.finkel at gmail.com
Tue May 14 06:08:01 UTC 2013
Hi all,
Over the last few weeks I've been working with George and Aaron on
updating BridgeDB's code with respect to how it handles pluggable
transports. I've made some decent progress, but there are some
questions that I'd like to ask (because I'm not sure I should be the
one making the decision). I've also started updating the spec and
there are some parts on which I'd like some clarification. I'll try to
summarize the thoughts on the matter we/I have thus far. See [A] if
you're unfamiliar with the BridgeDB code/spec/idea.
1) How should BridgeDB decide the number of transports, and types, it
should hand out?
- My current patch returns transports based on the ratio of how many
there are compared to the other bridges, so that if we hand out
four bridges and obfs2 bridges account for 3/10 of all running
bridges, then BridgeDB will hand out (4*(3/10)) = 1.2 bridges with
each request, on average.
- I've also added an option into bridgedb.conf to set the (expected)
minimum and maximum number of bridges which support a specific PT
that BridgeDB should hand out per request.
- I have a verification check that tries to force us to meet these
values, however, with its current implementation it's not
guaranteed, only probabilistic. I think this is okay for now.
- So, is this enough? Do we want/need a deterministic method of
supplying bridges with a supported set of transports?
- Another option is to place each transport into its own subring and
select from each of the subrings to ensure we meet the requirement.
The more I've thought about this, the more I think this defeats
the purpose of constructing the rings, though.
- Last (for now), if a bridge supports multiple PTs, should we return
all of them to the user or randomly select one or select one with a
bias? We agreed that we really shouldn't do the first because that
would just accelerate the ability of a censor to block more bridges.
The middle option works, but given that many bridges now support
obfs2 and obfs3, is it a good idea to, again, probabilistically
return each type (roughly) half the time?
2) Should we prefer to distribute PT bridges over regular bridges which
have their ORPort on 443?
- Right now returning ORPorts on 443 is the highest priority and
transports are a secondary best-effort operation.
3) Unless I incorrectly understand the code, the bridges never rotate.
The bridge interval is set to NoSchedule(), which means it returns
a static time. Is there a reason for this? This is counter to the
spec. Just wondering. :)
(I had some other points I wanted to raise, but I'm blanking on them
now. I think this is a good start, though.)
Please also let me know and correct anything I may have gotten wrong.
Thanks everyone, and thanks to George and Aaron for their help, as well.
- Matt
A. For those who don't know the details of the code, the simplified
version is as follows:
1) All bridges send their bridge descriptors and misc information
to the Bridge Authority.
2) Bridge Authority provides a network status file containing all
known bridges described by their name, fingerprint, digest,
time of publication, IP addr, ORPort, DirPort. Bridge Auth also
provides a bridge descriptor file also specifying the bridges
IP addr, ORPort, and fingerprint. Last, it supplies an extra-info
file that contains all the extra info that the bridges
provide - mainly their transports, in our case.
3) BridgeDB parses all of these files and associates the information
to a single instance of a bridge.
4) BridgeDB assigns each running bridge to a distributor (website,
email, etc) based on an hmac of the bridge's ID. Once assigned,
the bridge is inserted into the distributors list of bridges.
5) BridgeDB then further organizes the bridges assigned to each
distributor by moving them into rings and subrings.
- A ring is simply a sorted list of an hmac of the bridges' ID
which, when traversed, wraps around to the beginning if it ever
reaches the end.
- The hmac of the bridge's ID is used to retrieve the actual
bridge instance from a hash, which is stored along side the ring.
6) Some distributors, such as https, are 'initialized' with a few
rings based on filters.
- https starts out with a ring containing all bridges assigned to
it, a ring only containing bridges which support IPv4
connections, and a ring only containing bridges which support
IPv6 connections.
- Every ring also contains two subrings (currently). One subring
is the subset of bridges from the parent ring which have their
ORPort listening on port 443. The other subring is the subset
of bridges from the parent ring which have the stable flag set.
- For example,
- Cluster 1 Ring
- subring (stable)
- subring (https)
- Cluster 2 Ring
- subring (stable)
- subring (https)
- IPv4 Cluster 1 Ring
- subring (stable)
- subring (https)
- IPv4 Cluster 2 Ring
- subring (stable)
- subring (https)
- IPv6 Cluster 1 Ring
- subring (stable)
- subring (https)
- IPv6 Cluster 2 Ring
- subring (stable)
- subring (https)
7) When BridgeDB receives a request for bridges from its website, it
forwards the query on to the IP distributor. The details will
include if a specific PT was requested, IP version bridge
supports, country within which the bridge should not be blocked,
requesing IP address, and interval.
8) The distributor then decides on the "area" of the IP address,
currently the /24 mask, and then finds the "cluster" within that
area (by taking the first eight bytes of an hmac of the area and
using the result (modulus "the number of clusters")). A filter is
then constructed based on the requested information. If a ring
already exists that satisfies exactly these filters then that is
then constructed based on the requested information. If a ring
already exists that satisfies exactly these filters then that is
used. Else a new ring (with subrings) is constructed to satisfy
this request. The distributor also computes the position in the
ring as the hmac of the interval and the area.
9) Once the correct ring exists, it determines how many bridges it
can find in the ring's subrings to satisfy the request. This is
done by taking the previously computed position and finding it
in the list of bridges ID's hmacs and then selecting the next
consecutive "requested number of bridges" from the list (wrapping
around to the beginning, if necessary). The same is then done for
the main ring. The results from these searchs are then joined and
the first "requested number of bridges" unique keys are selected
from the list. This list is then sorted and returned, propagating
back up to the user.
10) Similar actions are taken by the other distributors. For example,
the email distributor doesn't use an "area" to decide which
bridges to distribute, it uses the normalized requesting/source
mail address.
11) Misc:
- Because the rings are sorted by an hmac of the bridge's ID, I
expect that they will be uniformly distributed around the ring.
As such, I don't expect there to be a bias for one type of
bridge/transport/ORPort over any other. (Is this incorrect?)
More information about the tor-dev
mailing list