[tor-dev] Torsocks locking design

Thu Jun 13 04:58:14 UTC 2013

On Sun, Jun 09, 2013 at 04:52:45PM -0400, David Goulet wrote:
> Hi everyone,
> 

Hi David!

> I'm posting here the small design document I've made for the Torsocks locking
> mechanism.
> 
> I'm looking for reviews, comments, improvements, ... well anything useful! :).
> (Even English mistakes since it's not my primary language).
> 
> Future changes will be pushed here if any are needed after this thread is done with.
> 
> https://github.com/dgoulet/torsocks/blob/rewrite/doc/proposals/01-Thread-safe-design.txt
> 

Great! Thanks for putting in the time/thought for this!

> Thanks people!
> David

> Date: 09/06/2013
> Author: David Goulet <dgoulet at ev0ke.net>
> Contributors:
> 	* Mathieu Desnoyers <mathieu.desnoyers at efficios.com>
> 
> This document details the design of torsocks locking for thread safety.
> 
> The basis is that at *any* time any thread can interact with Torsocks
> (connect(2), DNS resolution, ...). Considering that, some sort of
> synchronization is needed since torsocks needs to keep track of per socket
> connection. They are kept in a central registry that every thread needs to
> access.

This sounds reasonable.

> 
> The reason why it needs to be shared accross threads is because of fd passing.

s/accross/across/ :)

> It is possible and even not uncommon that threads exchange file descriptor(s) so
> we can't keep a registry of connections using TLS (Thread Local Storage).
> Furthermore, a process could easily, for instance, close(2) a socket within a
> signal handler making Torsocks access thread storage inside a signal handler and
> this SHOULD NOT be done, just never.
> 
> Considering the above, a locking mechanism is needed to protect the registry.
> Insertion, deletion and lookup in that registry CAN occur at the same time so a
> mutex has to be used to protect any action *on* it. This protection is provided
> by a mutex named "registry lock".
> 

Sure, this sounds like a good idea.

> Now, the objects inside that registry have to be protected during their life
> time (as long as a reference is held) after a lookup. To protect the free() from
> concurrent access, a refcount is used. Basically, each possible action on a
> connection object checks if the connection refcount is down to 0. If so, this
> means that no one is holding a reference to the object and it is ready to be
> destroyed. For that, the refcount needs to start with a value of 1.
> 

The beginning of this paragraph seems a bit unclear to me (maybe it's
just me). As of right now, as I understand it, we have an opaque object
that, at a minimum, holds references to the set of (open socket, ref
count) pairs. (Please correct me if I'm wrong.) In addition to this,
whenever this pair is retrieved, the refcount is incremented by 1 within
the registry thus allowing for use of this reference outside of any
critical sections. Then, when the function no longer needs the
reference, it decrements the refcount. If refcount reaches 0, then the
conn is not needed.

> For this scheme to work, the connection object MUST be immutable once created.
> Any part that can change during the lifetime of the connection object MUST be
> protected with a mutex.
> 
> This mechanism is used to avoid heavy contention on the registry lock. Without
> the refcount, a connection would have to be protected within the registry lock
> to avoid race between an access to the object and freeing it. As long as the
> lookups in the registry are not that frequent, this should scale since the
> critical section is pretty short.
> 

I think this should be okay. As it is currently implemented, Torsocks
only "remembers" socket that *it* needs. By this I mean that it only
maintains a reference to a socket if it is currently using the
connection in a SOCKS handshake or DNS resolution. Once Torsocks has
finished with its necessary operations, it returns complete control to
the application and "forgets" about the socket.

If you plan to keep a similar design, then registry entries will mostly
be short-lived and this should scale well. If your plan is to have the
registry maintain a complete list of all open connections, then this
still may scale and have little, or no, adverse affect on performance
but I'm unsure if this provides much benefit. Have you decided which of
these designs you want to use?

> Here is the algorithm for a read/write and destroy operation.
> 
> Prerequisites:
> ----
> Refcount of a connection object starts at 1. When down to 0, it can be
> destroyed.
> 
> Add to registry (e.g.: connect(2)):
> ----
> 1)	lock(registry)
> 2)		new_conn refcount = 1;
> 3)		add to registry
> /* We could also make a lookup for duplicates. */
> 4)	unlock(registry)
> 
> Read/Write op. (e.g.: DNS Lookup):
> ----
> 1)	lock(registry)
> 2)		conn = lookup
> 3)		if conn:
> 4)			atomic_inc(conn refcount)
> 5)	unlock(registry)
> 6)	[action using the conn]
> 7)	if atomic_dec_return(conn refcount) == 0:
> /*
>  * This is safe because at this point the connection object is not visible
>  * anymore to any thread so we can safely free the object after unlocking it.
>  */
> 8)		free conn
> 
> Destroy from registry (e.g.: close(2)):
> ----
> 1)	lock(registry)
> 2)		conn = lookup
> 3)		if conn:
> /*
>  * Make sure the connection object is not visible once we release the registry
>  * lock. Note that we DO NOT increment the refcount here because we are on the
>  * destroy path so we have to make the refcount come down to 0.
>  */
> 4)			remove from registry
> 5)	unlock(registry)
> 6)	if atomic_dec_return(conn refcount) == 0:
> 7)		free(conn)

I think these look good. They seem to lock a very small amount of code,
which is good. I suppose this is a nitpicky question but
regarding atomic_dec_return() and atomic_inc(), do they both return the
updated counter value? Or is atomic_dec_return() the only one
that does that (denoted by the _return suffix)?

I think this is simple, and simple is good. I have some questions about
the actual registry, but I'll defer asking them until they're more
on-topic. I'll try to take another pass of this when I'm less asleep.

Thanks again!

- Matt