[tor-dev] BridgeDB - Bridge Distribution Modifications

Tue May 14 07:42:47 UTC 2013

On 5/14/13 8:08 AM, Matthew Finkel wrote:
> Hi all,
> 
> Over the last few weeks I've been working with George and Aaron on
> updating BridgeDB's code with respect to how it handles pluggable
> transports.

Hi Matthew,

I didn't read your proposed BridgeDB changes in detail (sorry!), but I'd
like to ask for something: can you make sure that the
bridge-pool-assignments file stays useful when your changes are deployed?

https://metrics.torproject.org/formats.html#bridgepool

We're not processing bridge pool assignment files automatically, but
we'll include them in Atlas once it supports searching for bridges.
Right now the strings make some sense to bridge operators by giving them
an idea whether and how BridgeDB distributes their bridge.  If possible,
this should still be the case in the future.

Thanks!
Karsten

> I've made some decent progress, but there are some
> questions that I'd like to ask (because I'm not sure I should be the
> one making the decision). I've also started updating the spec and
> there are some parts on which I'd like some clarification. I'll try to
> summarize the thoughts on the matter we/I have thus far. See [A] if
> you're unfamiliar with the BridgeDB code/spec/idea.
> 
> 1) How should BridgeDB decide the number of transports, and types, it
>    should hand out?
> 
>   - My current patch returns transports based on the ratio of how many
>     there are compared to the other bridges, so that if we hand out
>     four bridges and obfs2 bridges account for 3/10 of all running
>     bridges, then BridgeDB will hand out (4*(3/10)) = 1.2 bridges with
>     each request, on average.
>   - I've also added an option into bridgedb.conf to set the (expected)
>     minimum and maximum number of bridges which support a specific PT
>     that BridgeDB should hand out per request.
>   - I have a verification check that tries to force us to meet these
>     values, however, with its current implementation it's not
>     guaranteed, only probabilistic. I think this is okay for now.
>   - So, is this enough? Do we want/need a deterministic method of
>     supplying bridges with a supported set of transports?
>   - Another option is to place each transport into its own subring and
>     select from each of the subrings to ensure we meet the requirement.
>     The more I've thought about this, the more I think this defeats
>     the purpose of constructing the rings, though.
>   - Last (for now), if a bridge supports multiple PTs, should we return
>     all of them to the user or randomly select one or select one with a
>     bias? We agreed that we really shouldn't do the first because that
>     would just accelerate the ability of a censor to block more bridges.
>     The middle option works, but given that many bridges now support
>     obfs2 and obfs3, is it a good idea to, again, probabilistically
>     return each type (roughly) half the time?
> 
> 2) Should we prefer to distribute PT bridges over regular bridges which
>    have their ORPort on 443?
>   - Right now returning ORPorts on 443 is the highest priority and
>     transports are a secondary best-effort operation.
> 
> 3) Unless I incorrectly understand the code, the bridges never rotate.
>    The bridge interval is set to NoSchedule(), which means it returns
>    a static time. Is there a reason for this? This is counter to the
>    spec. Just wondering. :)
> 
> 
> (I had some other points I wanted to raise, but I'm blanking on them
> now. I think this is a good start, though.)
> 
> Please also let me know and correct anything I may have gotten wrong.
> 
> Thanks everyone, and thanks to George and Aaron for their help, as well.
> 
> - Matt
> 
> 
> 
> 
> A. For those who don't know the details of the code, the simplified
>    version is as follows:
> 
>    1) All bridges send their bridge descriptors and misc information
>       to the Bridge Authority.
>    2) Bridge Authority provides a network status file containing all
>       known bridges described by their name, fingerprint, digest,
>       time of publication, IP addr, ORPort, DirPort. Bridge Auth also
>       provides a bridge descriptor file also specifying the bridges
>       IP addr, ORPort, and fingerprint. Last, it supplies an extra-info
>       file that contains all the extra info that the bridges
>       provide - mainly their transports, in our case.
>    3) BridgeDB parses all of these files and associates the information
>       to a single instance of a bridge.
>    4) BridgeDB assigns each running bridge to a distributor (website,
>       email, etc) based on an hmac of the bridge's ID. Once assigned,
>       the bridge is inserted into the distributors list of bridges.
>    5) BridgeDB then further organizes the bridges assigned to each
>       distributor by moving them into rings and subrings.
>      - A ring is simply a sorted list of an hmac of the bridges' ID
>        which, when traversed, wraps around to the beginning if it ever
>        reaches the end.
>      - The hmac of the bridge's ID is used to retrieve the actual
>        bridge instance from a hash, which is stored along side the ring.
>    6) Some distributors, such as https, are 'initialized' with a few
>       rings based on filters.
>      - https starts out with a ring containing all bridges assigned to
>        it, a ring only containing bridges which support IPv4
>        connections, and a ring only containing bridges which support
>        IPv6 connections.
>      - Every ring also contains two subrings (currently). One subring
>        is the subset of bridges from the parent ring which have their
>        ORPort listening on port 443. The other subring is the subset
>        of bridges from the parent ring which have the stable flag set.
>      - For example,
>         - Cluster 1 Ring
>           - subring (stable)
>           - subring (https)
>         - Cluster 2 Ring
>           - subring (stable)
>           - subring (https)
>         - IPv4 Cluster 1 Ring
>           - subring (stable)
>           - subring (https)
>         - IPv4 Cluster 2 Ring
>           - subring (stable)
>           - subring (https)
>         - IPv6 Cluster 1 Ring
>           - subring (stable)
>           - subring (https)
>         - IPv6 Cluster 2 Ring
>           - subring (stable)
>           - subring (https)
>    7) When BridgeDB receives a request for bridges from its website, it
>       forwards the query on to the IP distributor. The details will
>       include if a specific PT was requested, IP version bridge
>       supports, country within which the bridge should not be blocked,
>       requesing IP address, and interval.
>    8) The distributor then decides on the "area" of the IP address,
>       currently the /24 mask, and then finds the "cluster" within that
>       area (by taking the first eight bytes of an hmac of the area and
>       using the result (modulus "the number of clusters")). A filter is
>       then constructed based on the requested information. If a ring
>       already exists that satisfies exactly these filters then that is
>       then constructed based on the requested information. If a ring
>       already exists that satisfies exactly these filters then that is
>       used. Else a new ring (with subrings) is constructed to satisfy
>       this request. The distributor also computes the position in the
>       ring as the hmac of the interval and the area.
>    9) Once the correct ring exists, it determines how many bridges it
>       can find in the ring's subrings to satisfy the request. This is
>       done by taking the previously computed position and finding it
>       in the list of bridges ID's hmacs and then selecting the next
>       consecutive "requested number of bridges" from the list (wrapping
>       around to the beginning, if necessary). The same is then done for
>       the main ring. The results from these searchs are then joined and
>       the first "requested number of bridges" unique keys are selected
>       from the list. This list is then sorted and returned, propagating
>       back up to the user.
>   10) Similar actions are taken by the other distributors. For example,
>       the email distributor doesn't use an "area" to decide which
>       bridges to distribute, it uses the normalized requesting/source
>       mail address.
>   11) Misc:
>     - Because the rings are sorted by an hmac of the bridge's ID, I
>       expect that they will be uniformly distributed around the ring.
>       As such, I don't expect there to be a bias for one type of
>       bridge/transport/ORPort over any other. (Is this incorrect?)
> _______________________________________________
> tor-dev mailing list
> tor-dev at lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
>