[tor-dev] [Draft Proposal] Scalable Hidden Services
Matthew Finkel
matthew.finkel at gmail.com
Mon Oct 28 13:19:58 UTC 2013
Hi everyone,
This is a proposal I wrote to implement scalable hidden services. It's
by no means finished (there are some slight inconsistencies which I will
be correcting later today or tomorrow) but I want to make it public in
the meantime. I'm also working on some additional security measures that
can be used, but those haven't been written yet.
Many thanks to George for his initial feedback. I pushed this version to
my public repo, and will continue to push updates there.
In reality this is really 3.2 proposals, so the end result, if accepted,
will look significantly different, but I'm indecisive at this point
about which of the designs is least dangerous.
Thanks,
Matt
-------------- next part --------------
0. Introduction
The current Hidden Service design allows a single node to upload a descriptor
to the set of applicable Hidden Service Directories. This descriptor
describes how a client can establish a connectiob wih the hidden service.
A question arises when the hidden service receives more connections than
it can handle. Perhaps this performance issue is not at the hidden
service, but at the Introduction Points. In either case, the current design
does not sufficiently handle this situation, as a result, some users have
implemented ad-hoc solutions. Any improvements to the hidden service design
should include mechanisms for providing scalable hidden service solutions.
It is our hope that this proposal will provide such an option.
1. Overview
With the implementation of this proposal a hidden service operator will
have the ability to choose a methods by which connections to their hidden
service are load balanced to more than one node for servicing a single
hidden service. We begin by describing the necessary modifications to the
hidden services descriptor in Section 2, then in Section 3 we explain a
Master-Slave configuration, and follow that with a homogeneous nodes
configuration in Section 4. Section 5 delves briefly into a limited-scope
threat model, and Section 6 concludes with the references.
Sections 3 and 4 each present the basic design of their system and then
also describe two possible configurations based on the operators use case.
Let it be understood that this proposal is largely based on, and
supercedes parts of, the previously presented proposal "Migration to
ed25519 HS identity keys and privacy-preserving directory documents" [0].
The timeline for implementation of scalable hidden services also follows
from the implementation of that proposal and as well as the implementation
of the migration plan described in "On upgrading HS identity keys and on
a new HS directory scheme that does not leak" [1]. Please see those
proposals for more details.
Scalable hidden services will not effect usability of the system and will
not significantly change the maintainability of a multi-node hidden
service. Single-node hidden services will not be impacted.
2. Descriptor Modifications
To accomplish load balancing across multiple nodes two new optional fields
are added to the hidden service descriptor [2]
"additional-data" SP public-key
[Any number]
An ephemeral key that may be used to obtain a second descriptor from
the designated hidden service directory server.
"alternate-addresses" SP address NL
[Any number]
A secondary address by which the hidden service may be connected.
3. Master-Slave Configuration
The first optional configuration consists of a master node
and one or more slave nodes. When a master node exists, it is charged with
maintaining the hidden service descriptor and servicing clients, while the
slave nodes only service clients.
In this proposal we describe two possible scenarios for implementing this
paradigm. The first involves distributing a shared master secret key and
the second takes advantage of the hidden services infrastructure to
distribute client traffic to "sub"-hidden services.
3.1. Common Initialization
The two scenarios have some common procedures. These include torrc
configuration settings and master node selection. We will use the term
'peer' to refer to the slave nodes.
All nodes must create a new hidden service that will be used for
communication between nodes. In this configuration all communication will
take place between the master and a slave node, the slave nodes never
communicate with each other.
To configure this "management" hidden service we extend the use of the
HiddenServicePort torrc option. Currently it is defined as
HiddenServicePort VIRTPORT [TARGET]
where VIRTPORT is a virtual port. If the value of VIRTPORT is a negative
integer, we now define the hidden service as a management hidden service.
The hidden service address and key management are unchanged.
After all nodes have created a management hidden service, every other node
must be told about it. To facilitate this, we introduce a new torrc
option. The HiddenServicePeers option will take a list of hidden service
addesses.
HiddenServicePeers hostname1,hostname2,...
One or more hostname will be expected on this line. The first node will be
the master node. The hidden service operator is expected to configure all
nodes with same hostnames and listed in the same order. Failure to do so
will result in undefined behavior.
After this is configured, the master node fetches the hidden service
descriptor for all slave nodes. When it has retrieved all the descriptors
or a 5 minute timeout period is exceeded then it proceeds to the next
phase. If any descriptors were unavailable, then their respective hidden
service is considered unavailable for a 1 hour period. The master node
will attempt to refetch all descriptors, again, after 1 hour. This will
ensure the master can always communicate with its peers. Similarly, all
slave nodes fetch the master node's descriptor every hour.
The next phase of initialization requires that the master node generate
the public key pair for the hidden service to which users will connect.
3.2 Distributed Key
3.2.1 Key Distribution
The first method we propose to distribute load among the peers is by
simply distributing the master secret key to every node. It is extremely
important that a malicious peer node is not within an operator's threat
model if they choose this scenario. The only difference between a slave
node and the master node is that the slave nodes decide not to publish a
descriptor.
To transfer the key we introduce two new cell types
50 -- RELAY_COMMAND_KEY_FETCH
51 -- RELAY_COMMAND_KEY_FETCHED
KEY_FETCH is sent from the slave to the master, and KEY_FETCHED is sent
from the master to the slave.
The cell format of KEY_FETCH is
VER Version [1 octet]
HN Hostname [52 octets]
XL X point length [2 octets]
XP X point [XL octets]
SIG Signature of above fields [variable]
Upon receiving this cell, the master node checks that the signature is
correct using the public key included with the descriptor that maps to the
provided hostname in HN. If HN was not specified on the HiddenServicePeers
or the master node was unable to retrieve its descriptor or the signature
is invalid, then the master node discards the cell. Otherwise, the master
node forms a first-degree polynomial with x0 as the public hidden service's
private key and x1 is a randomly chosen value. We will use a key transport
protocol[3] to send the key to the slave node. We will also choose an
appropriate modulus.
[ Should we hardcode the modulus? ]
The master node then constructs a KEY_FETCHED cell containing:
VER Version [1 octet]
PHN Public hostname [52 octets]
Y0L Y0 point length [2 octets]
Y0P Y0 point [X0L octets]
X1L X1 point length [2 octets]
X1P X1 point [X1L octets]
Y1L Y1 point length [2 octets]
Y1P Y1 point [Y1L octets]
MDL Modulus length [3 octets]
MD Modulus [MDL octets]
SIG Signature of above fields [variable]
This cell is then sent back to the slave node. The slave node checks the
signature using the public key it obtained in the descriptor it has for
the master node. If it does not validate, the node the circuit. On
success, the node reconstructs the curve using the two points it now has.
It also verifies that public hostname in PHN is derived from the public
key it computes. If any of these checks fail, the node makes an attempt to
connect to the next hostname it has listed on the HiddenServicePeers
line. If they all fail, then the node waits a full descriptor period,
refetches the descriptors and tries again.
3.2.2 Distributed-Key Descriptor Creation
The only prerequisite a master node must fulfill before it can publish
the hidden service descriptor is to retrieve the introduction points from
its peers so it can add them to the descriptor for the public hidden
service. To accomplish this we introduce two additional cell types.
52 -- RELAY_COMMAND_SEND_INTROPNT
53 -- RELAY_COMMAND_INTROPNT_RECVD
The slave node sends a SEND_INTROPNT cell composed of:
VER Version [1 octet]
NIP Number of Introduction Points [1 octet]
IPS Introduction Points [variable]
The master node replies with a INTROPNT_RECVD cell that has an empty payload.
The slave node SHOULD use the same circuit for sending a KEY_FETCH cell
and a SEND_INTROPNT cell.
When the master node has received introduction points from each of its
slave nodes or 5 minutes has past since it published the descriptor for
its management hidden service, then it creates a new descriptor for the
public hidden service. If the master node has more than N introduction
points it chooses one of:
A) Publishing the first N introduction points in the descriptor and the
remaining in a separate descriptor, linked to by the address on the
additional-data line.
B) Only publish N introduction points. Select a different set of N
introduction points when the descriptor is next published.
Then it publishes the descriptor for the public hidden service.
3.3. Layered Addressing
3.3.1 Address Distribution
The second method we propose for distributing load among multiple nodes is
to create an abstraction layer. Similar to the design proposed in 3.2,
we maintain the master-slave configuration, but the secret key is
distributed in a need-to-know fashion. This potentially increases the
security of the key for a short time, and therefore limits the threat of
compromise, however it also increases the probability that node failure
will result in the hidden service being unavailable.
Every slave node generates a second hidden service, in addition to its
management hidden service. Each slave then send this address to the master
node. To do this, we introduce two new cell types:
54 -- RELAY_COMMAND_TELL_HSADDR
55 -- RELAY_COMMAND_HSADDR_LEARNED
A slave node sends the TELL_HSADDR cell as:
VER Version [1 octet]
HN Hostname [52 octets]
LEN Address length [1 octet]
ADR Address [LEN octets]
SIG Signature of above fields [variable]
Upon receiving this cell, the master node checks that the signature is
correct using the public key included in the descriptor that maps to the
provided hostname in HN. If HN was no specified on the HiddenServicePeers
or the master node was unable to retrieve its descriptor or the signature
is invalid, then the master node discards the cell. If the cell is
validated, then the address is accepted and the master node replies using
a HSADDR_LEARNED cell with an empty payload.
3.3.2 Layered-Addressing Descriptor Creation
When the master node has received an address from each of its
slave nodes or 5 minutes has past since it published the descriptor for
its management hidden service, then it creates a new descriptor for the
public hidden service using to "alternate-addresses" line to specify the
hidden service addresses for the slave nodes. If the master node has more
than N addresses it chooses one of:
A) Publishing the first N addresses in the descriptor and the
remaining in a separate descriptor, linked to by the address on the
additional-data line.
B) Only publish N addresses. Select a different set of N addresses when
the descriptor is next published.
Then it publishes the descriptor for the public hidden service.
3.3.3. Master Node
If the master node becomes unavailable then the hidden service will not be
able to publish new descriptors. To prevent this from occurring, we
reusing the logic from section 3.2 and share the master secret key
with the lesser of "the next two nodes listed on the HiddenServicePeers
line following the current master, in a circular list" and "the next 1/3
of all nodes on the HiddenServicePeers line following the current master,
in a circular list". All nodes that do not meet this criteria SHOULD
securely expunge knowledge of the key.
All slave nodes will attempt the next potential master node if the
current master if unreachable.
4. Homogeneous
Section 3 discussed scaling hidden services using a master-slave model.
This is useful in some cases, and easier, but is not ideal or safe in all.
As such, we describe a solution involving only peer nodes. Whereas we had
a predefined master node, now we will have an agreed-upon captain.
4.1. Common Initialization
The two scenarios in this section have some common procedures. These
include torrc configuration settings and captain selection. We will use
the term 'peer' to refer to all other nodes that comprise a multi-node
hidden service.
All nodes must create a new hidden service that will be used for
communication between nodes. In this configuration all communication will
take place between all nodes, creating a clique.
Exactly as described in Section 3.1, to configure this "management" hidden
service we extend the use of the HiddenServicePort torrc option. Currently
it is defined as
HiddenServicePort VIRTPORT [TARGET]
where VIRTPORT is a virtual port. If the value of VIRTPORT is a negative
integer, we now define the hidden service as a management hidden service.
The hidden service address and key management are unchanged.
After all nodes have created a management hidden service, every other node
must be told about it. To facilitate this, we introduce a new torrc
option, also described in Section 3.1. The HiddenServicePeers option will
take a list of hidden service addesses.
HiddenServicePeers hostname1,hostname2,...
At least one hostname will be expected on this line. The hidden service
operator is expected to configure all nodes using same hostnames in the
same order. Failure to do so will result in undefined behavior.
We also define 4 new relay commands:
56 -- RELAY_COMMAND_PREAUTH
57 -- RELAY_COMMAND_CONTRIB_KEY
58 -- RELAY_COMMAND_PRELIM_KEY
59 -- RELAY_COMMAND_KEY
4.2.1. Peer-Contributed Key Generation
The public hidden service's private key will be created from input
provided by all available peer nodes on first-launch. The resulting
private key will be known by all nodes.
At initialization-time every peer will request the hidden service
descriptor for every address on its HiddenServicePeers line. If a
descriptor is unfetchable for a D minute period then the peer is
considered unavailable and is not included in the key generation process
by the node.
After a node retrieves all descriptors or reaches the D minute limit, it
then establishes a connection with all available peers. If a connection
can not be established with a peer within a period of E minutes then the
peer is considered to be unavailable and is not included in the key
generation process by the node.
When a connection is successfully established, the node sends a PREAUTH
cell to the hidden service containing the following:
VER Version [1 octet]
HN Hostname [32 octets]
SIG Signature [64 octets]
[ XXX We don't want the hidden service to initiate authentication with
an AUTH_CHALLENGE so we use this as a seed. ]
When the hidden service receives this cell, it checks that the specified
hostname was listed on its HiddenServicePeers line. If it was not then it
destroys the circuit. Otherwise it checks that the signature is valid for
the public key provided in the hidden service descriptor. If it is not
valid then the hidden service destroys the circuit. Otherwise it creates
a temporary mapping between the circuit and the specified hostname. It
then responds with an AUTH_CHALLENGE cell.
The node then creates an AUTHENTICATE cell as defined in Section 4 of
Proposal 220 [ Migrate server identity keys to Ed25519 ] but the CID, SID,
SCERT and TLSSECRETS MUST be zero-filled. The hidden service then verifies
the content. The circuit is destroyed if authentication fails. If
successful the node sends a CONTRIB_KEY cell to the hidden service. The
content of this cell is the peers contribution to the master key.
A CONTRIB_KEY's payload contains:
VER Version [1 octet]
KEY Key Material [64 octets]
After the node's contribution is sent to all available peers and the node
receives contributions from all available peers the node then computes
a preliminary master key by XORing all contributions together. The node
then sends this prelimilary key to all its peers in a PRELIM_KEY cell and
likewise receives a cell from all peers. These transfers MUST occur over
the same circuits that were previously authenticated. If the circuit was
destroyed after successful authenication then a new circuit must be
established and reauthenticated, as defined by a PREAUTH, AUTH_CHALLENGE,
and AUTHENTICATE transaction.
A PRELIM_KEY's payload is the same as the CONTRIB_KEY's payload. The KEY
cell will also have the same format.
When the node has received a PRELIM_KEY cell from its peers it determines
which value was sent by at least two-thirds of its peers. If no value
reaches this ratio then the process is restarted. If a common value
exists then this is the private key for the public hidden service. The
node then sends this key to its peers in a KEY cell, thus making certain
all nodes agreed on the private key. Again, if two-thirds of the keys
are not in agreement then the process is restarted.
4.2.2. Single Node Key Generation
A simpler alternative to Peer-Contributed Key Generation is to allow a
node generate the private key and share it with the its peers.
4.2.3. Key Retreival
If a node is added to the set of peers after key generation takes place
then it must be able to seamlessly retrieve the key. This is done the using
the same method introduced in Section 3.2 with the addition that any node
may be queried.
4.2.4. Captain Selection
When all nodes holds the secret key they are all capable of generating
the hidden service descriptor. The node that creates the descriptor is
the first available node listed on the HiddenServicePeers line, in a ring.
When the current captain is not available, the peers declare the next
node as the new Captain.
4.2.5. Introduction Point Distribution
Before a new descriptor can be published the captain should obtain the
introduction points for its peers. To accomplish this, every node sends
the captain its set of introduction points using the SEND_INTROPNT cell
and receives an acknowledgement with an INTROPNT_RECVD cell defined in
Section 3.2. The captain includes a subset of these introduction points
in the descriptor.
4.2.6. Descriptor Publication
As described in Section 3.2,2 the Captain follows the same logic as the
Master node.
4.3.1. Layered Addressing
This is a mix of Section 3.3 and 4.2 whereas each node has the private
key and can create a hidden service descriptor, but scalability is
achieved by a level of indirection and clients connect to a secondary
hidden service, specific to a node rather than the public hidden
service address derived from with the private key generated in
Section 4.2.1 or 4.2.2.
[ XXX This section will remain mostly unspecified unless it is decided
that this is an advantageous design. ]
5. Threat Model
The security of scalable hidden services is bounded by the protections
that Tor can provide. As such, users of hidden services and inter-hidden
service communication are current assumed to be suseptible to
end-to-end correlation and traffic confirmation attacks, scalable
hidden services do not solve this problem or any other known attack.
Scalable hidden services, the same as hidden services, do not solve
deanonymization attacks on the application layer; this is true for both
client and server (i.e. hidden services do not prevent deanonymization
due to vulnerabilities in Firefox nor PHP) targets.
Scalable hidden services introduce additional attack vectors compared to
single-node hidden services. As a result, the protocol should be designed
under the assumption that the nodes are not compromised but that a
single node can not influence the availability of another node except if
the compromised node was selected as the node that creates the hidden
service descriptor.
It is difficult to hide the size of a multi-node hidden service. These
designs reveal the same amount of information to the Tor network as
single-node hidden services but allow clients to potentially estimate the
size of the service. By placing the burden of attacking hidden services
at the client-end, this allows each node in a scalable hidden service to
maintain the same level of security as an individual single-node hidden
service.
6 References
[0] https://lists.torproject.org/pipermail/tor-dev/2013-October/005536.html
[1] https://lists.torproject.org/pipermail/tor-dev/2013-October/005534.html
[2] https://gitweb.torproject.org/torspec.git/blob/HEAD:/rend-spec.txt#l219
[3]
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.12.5552&rep=rep1&type=pdf
More information about the tor-dev
mailing list