[tor-bugs] #7572 [Tor]: Make relay crypto run on multiple CPU cores

Wed Dec 2 04:19:26 UTC 2015

#7572: Make relay crypto run on multiple CPU cores
-----------------------+------------------------------
 Reporter:  nickm      |          Owner:  andrea
     Type:  defect     |         Status:  new
 Priority:  High       |      Milestone:  Tor: 0.2.???
Component:  Tor        |        Version:
 Severity:  Normal     |     Resolution:
 Keywords:  tor-relay  |  Actual Points:
Parent ID:  #1749      |         Points:
  Sponsor:             |
-----------------------+------------------------------

Comment (by jsturgix):

 I looked for an approach that I could generalize and apply to both the
 relay_crypt() case and the circuit_package_relay_cell() case.  At first
 glance, I didn't see anything easy, and since there were already a number
 of moving parts unfamiliar to me, I focused on the relay_crypt() case.

 In general, this was my thought process and approach:

 (1) I created new files src/or/cryptothreads.c and src/or/cryptothreads.h.
 These are modeled after src/or/cpuworker.c and create the thread pool.
 cpuworker.c is big and I thought cryptothreads.c might also become big.
 Now it is small and it might make sense to roll cryptothreads.c into
 another existing source file like src/or/relay.c.

 (2) From src/or/main.c, I call crypto_threads_init() (in cryptothreads.c)
 to initialize the events and thread pool handling.

 (3) In command_process_relay_cell() (src/or/command.c), I encapsulated and
 moved everything after the call to circuit_receive_relay_cell() into
 circuit_receive_relay_cell_post() (relay.c).  The idea was
 circuit_receive_relay_cell() would eventually queue the crypto task, but
 circuit_receive_relay_cell_post() would still be executed by the thread
 pool callback function in the context of the main thread.  In other words,
 command_process_relay_cell() needs unwind and eventually return back to
 event loop monitoring; and circuit_receive_relay_cell_post() is still
 called but asynchronously.

 (4) I basically broke circuit_receive_relay_cell() (relay.c) into two
 parts: cryptothread_threadfn() and cryptothread_replyfn().
 cryptothread_threadfn() is run by a thread in the thread pool and calls
 down relay_crypt() -> relay_crypt_one_payload() ->
 crypto_cipher_crypt_inplace() and so forth into AES routines.  When
 cryptothread_threadfn() finishes, the main thread (through its event loop)
 is signaled task complete and the main thread then calls
 cryptothread_replyfn().  There is some glue to make this happen such as
 queue_job_for_cryptothread() (reply.c) and replyqueue_process_cb()
 (cryptothread.c), but uses the existing src/common/workqueue.c
 implementation as modeled by cpuworker.c.

 Initially, I did not think relay_crypt() accessed any resources shared by
 the main thread, so I have *NOT* added any synchronized access of shared
 data and I suspect this is the problem.  All/most? access of shared data
 seemed to be done in the main thread's context after responding to an
 event (to include the thread pool callback function
 cryptothread_replyfn()) but admittedly I don't have a good grasp of the
 cell structures and cell/circuit queues used in the main thread.  Me
 thinks I have reasoned incorrectly since the differences between the
 refactored single-thread version and the multiple thread version are
 relatively few.

 From what I remember (or perhaps assumed), the functionality in
 src/common/workqueue.c is properly synchronized because it is already
 being used (but less intensely?).

 Also, I have read the wiki article
 https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/MultithreadedCrypto
 but I have not fully merged these ideas with the newer(?)
 workqueue/cpuworker implementation.

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/7572#comment:11>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online