[tor-talk] Load Balancing/High Availability Hidden Services
Donncha O'Cearbhaill
donncha at donncha.is
Fri Mar 13 12:50:41 UTC 2015
On 13/03/15 12:02, George Kadianakis wrote:
> Donncha O'Cearbhaill <donncha at donncha.is> writes:
>
>> On 11/03/15 17:40, George Kadianakis wrote:
>>> MacLemon <tor at maclemon.at> writes:
>>>
>>>> Hoi!
>>>>
>>>> I'm looking into ideas for creating “load balanced” or “high availability” hidden services. Mostly pertaining to web servers serving large-ish static files. (Let's say 5-100MB each.)
>>>>
>>>> Load balanced as in not all requests end up at the same box to speed up downloads.
>>>> High availability as in the service is still available if one box goes down or is taken offline for maintenance.
>>>>
>>>> So, not exactly your usual distributed-cluster setup.
>>>>
>>>>
>>>> From what I understand it would not make sense to run the same HS Key on multiple boxes since the descriptors would overwrite each other every few minutes.
>>>>
>>>> I don't think one can do something like Round-Robin DNS with HS.
>>>>
>>>> So the only way I can imagine this to work is a central redirection node that know about all the nodes and more or less intelligently/randomly 302 redirects each file request to a known-to-it server.
>>>>
>>>> This still leaves a single-point-of-failure in form of the redirection server but would at least distribute the traffic load across multiple servers and cope for nodes coming and going.
>>>>
>>>> Has anyone done something like this?
>>>>
>>>
>>> An application-layer load balancer like HAProxy might be able to help you.
>>>
>>> Unfortunately, there is not something equivalent to DNS Round Robin
>>> for hidden services yet. There are some ideas on how to do this on the
>>> Introduction Point-layer, but a proposal still needs to be written. For
>>> further reading:
>>> https://lists.torproject.org/pipermail/tor-dev/2014-April/006788.html
>>> https://lists.torproject.org/pipermail/tor-dev/2014-May/006812.html
>>>
>>
>> I've been thinking a little about this problem. It seems like one simple
>> solution would be to find a way of combining the
>> descriptors/introduction points from multiple Tor HS instances into one
>> hidden service descriptor from which the client can pick intro points at
>> random.
>>
>> To implement such a solution like that, there needs to be a means for
>> the hidden service instances to communicate with other. They can then
>> either selected a common set of intro points or combine the individual
>> sets of intro points selected by each instance.
>>
>> In one straight forward implementation a hidden service operator could
>> set up a Tor relay and a hidden service on each of their load-balancing
>> nodes. These load balancing hidden services SHOULD have different hidden
>> service keys and could use stealth authentication for privacy (so that
>> their introduction points are encrypted).
>>
>> A management server would contain the actual hidden service key but
>> would not need to run any hidden services. The role of the management
>> server would be to regularly fetch descriptors for the load-balancing
>> hidden services, combine the introduction points into a single
>> descriptor for the hidden service which is then published as normal.
>>
>> After the signature on the hidden service descriptor is verified there
>> is the hidden service protocol doesn't user the permanent key/onion
>> address. As such there is no need for each of the relays to have a copy
>> of the hidden service key.
>>
>> I think this provides a couple of advantages:
>>
>> - The hidden service key only needs to be in one place. The machine
>> holding the key would generate very little traffic and would not be
>> locatable by the publically known hidden service attacks.
>>
>> - The hidden service key could be stored encrypted on the management
>> server.
>>
>> - If any of the load balancing relays are compromised an operator
>> simply needs to stop including its introduction points in the
>> descriptor. This should minimise the need to 'revoke' hidden service keys.
>>
>> The number of introduction points in a HS descriptor is currently
>> limited to 10 in Tor. This sets a limit on the number of load-balancing
>> nodes that could be deployed at present.
>>
>> This approach doesn't require any changes to the Tor code base at all. I
>> hope to implement a management server in the next few weeks to test how
>> this works in practice.
>>
>> It would be great to get any feedback about this proposal!
>>
>
> Interesting approach.
>
> I especially like the fact that it doesn't need little-t-tor
> modifications to work, which means that we can experiment with it and
> if its UX and threat model works out for people we can even change
> little-t-tor to improve it.
>
> The main problem I see is that all parts of the system will be racing
> each other constantly. A harmful race condition I can see, is this one:
>
> a) Management server is about to make a new superdescriptor.
> Management server polls HSDirs for descriptors and
> waits till it receives them.
>
> b) At the same time as (a), one of the HS nodes abandons an
> introduction point (because it expired or because it went down),
> and publishes a new descriptor with a new intro point.
>
> c) The management server is done forming the superdescriptor and
> publishes it. The management server was not aware of event (b) and
> hence the superdescriptor includes the old introduction point of
> that node, which means that clients who pick that IP will fail.
>
> I think that this is not catastrophic failure because a client will
> move on to the next intro point in the list.
Those kind of race conditions will definitely be an issue to some
extent. Is there any data about how often IP circuits fail in practice?
>
> It's also worth noting that in this system, availability is enforced
> by having the client try the next introduction point. That is if an HS
> node is down, introductions to it will fail and the client will only
> be successful when she moves to the intro point of an active HS
> node. Hence, we should ensure that clients don't spend 5 mins before
> moving to the next intro point, or don't abandon connecting after
> failing to connect to two bad intro points.
Is there a limitation to introduction point attempts or timeout's in
little-t tor at present?
>
> Other race conditions here, involve the management server missing
> descriptors from certain HS nodes and publishing incomplete
> descriptors, or not publishing anything till all HS node descriptors
> have been retrieved. We should make sure that the management server
> polls often enough that these problems are minimized.
For now I'll poll every 15 minutes. If the management server detects
that the IP set has changed it can then republish the descriptor.
Hopefully that will minimise how often clients receive expired IP's.
>
> On security now, it's worth having in mind that HSDir servers and HS
> clients can now monitor presense of HS nodes, by looking at the
> content of superdescriptors. Also, it will probably be easy to learn
> whether an HS is scaleable and approximately how many nodes it has,
> just by looking at the number of IPs.
>
> BTW, why do you say that "a hidden service operator could set up a Tor
> relay and a hidden service on each of their load-balancing nodes"? Why
> do we need a Tor relay in this case?
Ah I misspoke, I meant to say a Tor instance not a Tor relay.
>
> Thanks for the idea.
>
> Looking forward to see what happens with this!
>
For the management server to retrieve and upload descriptors via an
unpatched Tor instance #3523 and #14847 would need to be merged.
Thanks for the feedback,
Donncha
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 630 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/tor-talk/attachments/20150313/dc412b01/attachment.sig>
More information about the tor-talk
mailing list