[tor-dev] BridgeDB - Bridge Distribution Modifications

Tue May 14 12:37:50 UTC 2013

On Tue, May 14, 2013 at 09:42:47AM +0200, Karsten Loesing wrote:
> On 5/14/13 8:08 AM, Matthew Finkel wrote:
> > Hi all,
> > 
> > Over the last few weeks I've been working with George and Aaron on
> > updating BridgeDB's code with respect to how it handles pluggable
> > transports.
> 
> Hi Matthew,
> 
> I didn't read your proposed BridgeDB changes in detail (sorry!), but I'd
> like to ask for something: can you make sure that the
> bridge-pool-assignments file stays useful when your changes are deployed?
> 
> https://metrics.torproject.org/formats.html#bridgepool
> 
> We're not processing bridge pool assignment files automatically, but
> we'll include them in Atlas once it supports searching for bridges.
> Right now the strings make some sense to bridge operators by giving them
> an idea whether and how BridgeDB distributes their bridge.  If possible,
> this should still be the case in the future.
> 
> Thanks!
> Karsten
> 

Hi Karsten,

Absolutely. To be honest, I don't expect these modifications to impact
that file much, and I see no reason to alter the format of it, but I'll
verify everything remains sane throughout the updates.

Thanks for raising your concern!

Matt

> 
> > I've made some decent progress, but there are some
> > questions that I'd like to ask (because I'm not sure I should be the
> > one making the decision). I've also started updating the spec and
> > there are some parts on which I'd like some clarification. I'll try to
> > summarize the thoughts on the matter we/I have thus far. See [A] if
> > you're unfamiliar with the BridgeDB code/spec/idea.
> > 
> > 1) How should BridgeDB decide the number of transports, and types, it
> >    should hand out?
> > 
> >   - My current patch returns transports based on the ratio of how many
> >     there are compared to the other bridges, so that if we hand out
> >     four bridges and obfs2 bridges account for 3/10 of all running
> >     bridges, then BridgeDB will hand out (4*(3/10)) = 1.2 bridges with
> >     each request, on average.
> >   - I've also added an option into bridgedb.conf to set the (expected)
> >     minimum and maximum number of bridges which support a specific PT
> >     that BridgeDB should hand out per request.
> >   - I have a verification check that tries to force us to meet these
> >     values, however, with its current implementation it's not
> >     guaranteed, only probabilistic. I think this is okay for now.
> >   - So, is this enough? Do we want/need a deterministic method of
> >     supplying bridges with a supported set of transports?
> >   - Another option is to place each transport into its own subring and
> >     select from each of the subrings to ensure we meet the requirement.
> >     The more I've thought about this, the more I think this defeats
> >     the purpose of constructing the rings, though.
> >   - Last (for now), if a bridge supports multiple PTs, should we return
> >     all of them to the user or randomly select one or select one with a
> >     bias? We agreed that we really shouldn't do the first because that
> >     would just accelerate the ability of a censor to block more bridges.
> >     The middle option works, but given that many bridges now support
> >     obfs2 and obfs3, is it a good idea to, again, probabilistically
> >     return each type (roughly) half the time?
> > 
> > 2) Should we prefer to distribute PT bridges over regular bridges which
> >    have their ORPort on 443?
> >   - Right now returning ORPorts on 443 is the highest priority and
> >     transports are a secondary best-effort operation.
> > 
> > 3) Unless I incorrectly understand the code, the bridges never rotate.
> >    The bridge interval is set to NoSchedule(), which means it returns
> >    a static time. Is there a reason for this? This is counter to the
> >    spec. Just wondering. :)
> > 
> > 
> > (I had some other points I wanted to raise, but I'm blanking on them
> > now. I think this is a good start, though.)
> > 
> > Please also let me know and correct anything I may have gotten wrong.
> > 
> > Thanks everyone, and thanks to George and Aaron for their help, as well.
> > 
> > - Matt
> > 
> > 
> > 
> > 
> > A. For those who don't know the details of the code, the simplified
> >    version is as follows:
> > 
> >    1) All bridges send their bridge descriptors and misc information
> >       to the Bridge Authority.
> >    2) Bridge Authority provides a network status file containing all
> >       known bridges described by their name, fingerprint, digest,
> >       time of publication, IP addr, ORPort, DirPort. Bridge Auth also
> >       provides a bridge descriptor file also specifying the bridges
> >       IP addr, ORPort, and fingerprint. Last, it supplies an extra-info
> >       file that contains all the extra info that the bridges
> >       provide - mainly their transports, in our case.
> >    3) BridgeDB parses all of these files and associates the information
> >       to a single instance of a bridge.
> >    4) BridgeDB assigns each running bridge to a distributor (website,
> >       email, etc) based on an hmac of the bridge's ID. Once assigned,
> >       the bridge is inserted into the distributors list of bridges.
> >    5) BridgeDB then further organizes the bridges assigned to each
> >       distributor by moving them into rings and subrings.
> >      - A ring is simply a sorted list of an hmac of the bridges' ID
> >        which, when traversed, wraps around to the beginning if it ever
> >        reaches the end.
> >      - The hmac of the bridge's ID is used to retrieve the actual
> >        bridge instance from a hash, which is stored along side the ring.
> >    6) Some distributors, such as https, are 'initialized' with a few
> >       rings based on filters.
> >      - https starts out with a ring containing all bridges assigned to
> >        it, a ring only containing bridges which support IPv4
> >        connections, and a ring only containing bridges which support
> >        IPv6 connections.
> >      - Every ring also contains two subrings (currently). One subring
> >        is the subset of bridges from the parent ring which have their
> >        ORPort listening on port 443. The other subring is the subset
> >        of bridges from the parent ring which have the stable flag set.
> >      - For example,
> >         - Cluster 1 Ring
> >           - subring (stable)
> >           - subring (https)
> >         - Cluster 2 Ring
> >           - subring (stable)
> >           - subring (https)
> >         - IPv4 Cluster 1 Ring
> >           - subring (stable)
> >           - subring (https)
> >         - IPv4 Cluster 2 Ring
> >           - subring (stable)
> >           - subring (https)
> >         - IPv6 Cluster 1 Ring
> >           - subring (stable)
> >           - subring (https)
> >         - IPv6 Cluster 2 Ring
> >           - subring (stable)
> >           - subring (https)
> >    7) When BridgeDB receives a request for bridges from its website, it
> >       forwards the query on to the IP distributor. The details will
> >       include if a specific PT was requested, IP version bridge
> >       supports, country within which the bridge should not be blocked,
> >       requesing IP address, and interval.
> >    8) The distributor then decides on the "area" of the IP address,
> >       currently the /24 mask, and then finds the "cluster" within that
> >       area (by taking the first eight bytes of an hmac of the area and
> >       using the result (modulus "the number of clusters")). A filter is
> >       then constructed based on the requested information. If a ring
> >       already exists that satisfies exactly these filters then that is
> >       then constructed based on the requested information. If a ring
> >       already exists that satisfies exactly these filters then that is
> >       used. Else a new ring (with subrings) is constructed to satisfy
> >       this request. The distributor also computes the position in the
> >       ring as the hmac of the interval and the area.
> >    9) Once the correct ring exists, it determines how many bridges it
> >       can find in the ring's subrings to satisfy the request. This is
> >       done by taking the previously computed position and finding it
> >       in the list of bridges ID's hmacs and then selecting the next
> >       consecutive "requested number of bridges" from the list (wrapping
> >       around to the beginning, if necessary). The same is then done for
> >       the main ring. The results from these searchs are then joined and
> >       the first "requested number of bridges" unique keys are selected
> >       from the list. This list is then sorted and returned, propagating
> >       back up to the user.
> >   10) Similar actions are taken by the other distributors. For example,
> >       the email distributor doesn't use an "area" to decide which
> >       bridges to distribute, it uses the normalized requesting/source
> >       mail address.
> >   11) Misc:
> >     - Because the rings are sorted by an hmac of the bridge's ID, I
> >       expect that they will be uniformly distributed around the ring.
> >       As such, I don't expect there to be a bias for one type of
> >       bridge/transport/ORPort over any other. (Is this incorrect?)
> > _______________________________________________
> > tor-dev mailing list
> > tor-dev at lists.torproject.org
> > https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
> > 
>