[tor-bugs] #7572 [Tor]: Make relay crypto run on multiple CPU cores
Tor Bug Tracker & Wiki
blackhole at torproject.org
Wed Dec 2 04:19:26 UTC 2015
#7572: Make relay crypto run on multiple CPU cores
-----------------------+------------------------------
Reporter: nickm | Owner: andrea
Type: defect | Status: new
Priority: High | Milestone: Tor: 0.2.???
Component: Tor | Version:
Severity: Normal | Resolution:
Keywords: tor-relay | Actual Points:
Parent ID: #1749 | Points:
Sponsor: |
-----------------------+------------------------------
Comment (by jsturgix):
I looked for an approach that I could generalize and apply to both the
relay_crypt() case and the circuit_package_relay_cell() case. At first
glance, I didn't see anything easy, and since there were already a number
of moving parts unfamiliar to me, I focused on the relay_crypt() case.
In general, this was my thought process and approach:
(1) I created new files src/or/cryptothreads.c and src/or/cryptothreads.h.
These are modeled after src/or/cpuworker.c and create the thread pool.
cpuworker.c is big and I thought cryptothreads.c might also become big.
Now it is small and it might make sense to roll cryptothreads.c into
another existing source file like src/or/relay.c.
(2) From src/or/main.c, I call crypto_threads_init() (in cryptothreads.c)
to initialize the events and thread pool handling.
(3) In command_process_relay_cell() (src/or/command.c), I encapsulated and
moved everything after the call to circuit_receive_relay_cell() into
circuit_receive_relay_cell_post() (relay.c). The idea was
circuit_receive_relay_cell() would eventually queue the crypto task, but
circuit_receive_relay_cell_post() would still be executed by the thread
pool callback function in the context of the main thread. In other words,
command_process_relay_cell() needs unwind and eventually return back to
event loop monitoring; and circuit_receive_relay_cell_post() is still
called but asynchronously.
(4) I basically broke circuit_receive_relay_cell() (relay.c) into two
parts: cryptothread_threadfn() and cryptothread_replyfn().
cryptothread_threadfn() is run by a thread in the thread pool and calls
down relay_crypt() -> relay_crypt_one_payload() ->
crypto_cipher_crypt_inplace() and so forth into AES routines. When
cryptothread_threadfn() finishes, the main thread (through its event loop)
is signaled task complete and the main thread then calls
cryptothread_replyfn(). There is some glue to make this happen such as
queue_job_for_cryptothread() (reply.c) and replyqueue_process_cb()
(cryptothread.c), but uses the existing src/common/workqueue.c
implementation as modeled by cpuworker.c.
Initially, I did not think relay_crypt() accessed any resources shared by
the main thread, so I have *NOT* added any synchronized access of shared
data and I suspect this is the problem. All/most? access of shared data
seemed to be done in the main thread's context after responding to an
event (to include the thread pool callback function
cryptothread_replyfn()) but admittedly I don't have a good grasp of the
cell structures and cell/circuit queues used in the main thread. Me
thinks I have reasoned incorrectly since the differences between the
refactored single-thread version and the multiple thread version are
relatively few.
From what I remember (or perhaps assumed), the functionality in
src/common/workqueue.c is properly synchronized because it is already
being used (but less intensely?).
Also, I have read the wiki article
https://trac.torproject.org/projects/tor/wiki/org/projects/Tor/MultithreadedCrypto
but I have not fully merged these ideas with the newer(?)
workqueue/cpuworker implementation.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/7572#comment:11>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list