[tor-dev] Proposal: Load Balancing with Overhead Parameters

Mike Perry mikeperry at torproject.org
Fri Jan 15 17:01:22 UTC 2016


Mike Perry:
> Tim Wilson-Brown - teor:
> > > On 13 Jan 2016, at 00:53, Mike Perry <mikeperry at torproject.org> wrote:
> > > 1. Overview
> > > 
> > > For padding overhead due to Proposals 251 and 254, and changes to hidden
> > > service path selection in Proposal 247, it will be useful to be able to
> > > specify a pair of parameters that represents the additional traffic
> > > present on Guard and Middle nodes due to these changes.
> > 
> > I don't know if it's worth noting that proposals 252 and 260 (the Single Onion
> > Services variants) will reduce traffic to guard and middle nodes, as they
> > remove the nodes between the hidden service and client-controlled endpoint
> > (rendezvous point in Rendezvous Single Onion Services, extend point in
> > Single Onion Services).
> > 
> > I think these will just be part of the overhead parameters?
> 
> Probably, though this will require us to have some estimation on the
> amount of network traffic using these services. Will they be broken out
> separately in the extra-info hidden service stats?
> 
> > > 2. Simplified Load Balancing Equations
> > > 
> > > Recall that the point of the load balancing equations in section 3.8.3
> > > of dir-spec.txt is to ensure that an equal amount of client traffic is
> > > distributed between Guards, Middles, Exits, and Guard+Exits, where each
> > > flag type can occupy one or more positions in a path. This allocation is
> > > accomplished by solving a system of equations for weights for flag
> > > position selection to ensure equal allocation of client traffic for each
> > > position in a circuit.
> > > 
> > > If we ignore overhead for the moment and treat Guard+Exit nodes as Exit
> > > nodes, then this allows the simplified system of equations to become:
> > > 
> > >  Wgg*G == M + Wme*E + Wmg*G    # Guard position == middle position
> > >  Wgg*G == Wee*E                # Guard position == equals exit position
> > >  Wmg*G + Wgg*G == G            # Guard allocation weights sum to 1
> > >  Wme*E + Wee*E == E            # Exit allocation weights sum to 1
> > 
> > I think this analysis assumes that all paths in the Tor network are 3-hop paths.
> 
> That is correct.
>  
> > What about the 4-hop paths created by circuit cannibalisation?
> > (We don't actually know what proportion of the Tor network has 3-hop vs 4-hop paths.)
> > 
> > Depending on whether an exit or internal circuit is cannibalised, they can look like:
> > G M E E
> > G M M E
> > 
> > Or, if cannibalised from an exit or internal circuit:
> > G M E M
> > G M M M
> > 
> > Again, I think these will just be part of the overhead parameters?
> 
> Ugh. This will be complicated to work out. Cannibalization rate is a
> function of how often you need to build a path, and have circuits
> available, but none with the right exit/endpoint at the time. This will
> be hard to predict accurately in the wild.
> 
> Cannibalization has been the bane of my existence every time I've made
> any performance or path selection change to Tor. It was an optimization
> introduced because TAP was believed to be too CPU-expensive to allow us
> to opt to build fresh circuits when we needed a different endpoint than
> was currently available in the open set, and so we needed to add the
> new required endpoint to an existing open circuit instead.
> 
> Now that we have ntor, I am wondering if we can just remove
> cannibalization. My guess is that without it, we will pay some up-front
> latency cost in terms of needing to build fresh full circuits in some
> (rare?) cases, but probably will have faster and higher capacity
> circuits for those cases once they are built (especially since the
> Circuit Build Timeout cutoff works better without factoring in
> cannibalization time, as well). Circuit build prediction should also
> reduce this cost over time, since it will ensure that we always have a
> circuit open to recently used endpoint types (either exit ports or
> hidserv-capable circs).
> 
> David Goulet supposedly has a simulator+test script to examine the
> impact of cannibalization. He said that he'd look into updating it for
> 0.2.7/master, but here's a graph of his results from 0.2.5:
> https://people.torproject.org/~dgoulet/hs/rp-025.png
> 
> Basically, that says that unless you make a whole lot of rendezvous
> connections, cannibalized circuits are slower. I'm not sure why their
> speed would be a function of quantity at all.. As I said, circuit
> prediction should reduce the need for cannibalization over time, but it
> shouldn't actually make the circuits that do get cannibalized any
> faster..
> 
> All told, I think in the interest of simplification and also better load
> balancing, we should still kill cannibalization for most cases. The one
> case where I think it may still be desired is in the connection to the
> hsdir, since it seems natural to just extend an existing circuit rather
> than build a whole new one there.
> 
> > And what about hidden service paths (paths that include two middle nodes?)
> > G M M
> 
> This one definitely needs to be included in the M_o overhead. I believe
> the right additive amount is the current estimated percentage of network
> hidden service traffic (by bytes), but we need to make sure we use the
> estimate that reflects the fact that two hidden service paths are used
> for each hidden service connection. So: network total hidden service
> traffic percentage, not actual total hidden service throughput.

With Proposal 247, we decided that to avoid the need to create a
secondary guard connection for issues like
https://trac.torproject.org/projects/tor/ticket/14917, we have to omit
Guard-flagged nodes from the two M's in the G M M path selection for
hidden services and their rendezvous points. This means that we need to
subtract the hidden service overhead from G_o, in addition to adding to
to M_o.

Right now, I'm thinking this means M_o needs to add the total hidden
service *network bytecount* overhead (2X the hidden service end-to-end
traffic bytecount). We also need to subtract 4*Wmg*hs_e2e_bytecount
from the G_o overhead, to account for not using Guard-flagged nodes for
the four M's in full prop-247 G M M M M G circuits.

Again, I still think we should kill cannibalization, so that we don't
have to worry about 7 hop hidden service circuits for a new rendezvous
point being added to an existing cannibalized circuit.


I have added these and the other comments from Teor to a new branch:
https://gitweb.torproject.org/user/mikeperry/torspec.git/tree/proposals/265-load-balancing-with-overhead.txt?h=load_balancing2

They are still in the form of XXX's, pending additional comments. I will
turn them into better descriptions if this makes sense to folks.

-- 
Mike Perry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Digital signature
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20160115/d8b5d324/attachment-0001.sig>


More information about the tor-dev mailing list