[tor-bugs] #1749 [Core Tor/Tor]: Split relay and link crypto across multiple CPU cores
Tor Bug Tracker & Wiki
blackhole at torproject.org
Fri Jan 11 02:19:36 UTC 2019
#1749: Split relay and link crypto across multiple CPU cores
-------------------------------------------------+-------------------------
Reporter: nickm | Owner:
| chelseakomlo
Type: project | Status:
| assigned
Priority: High | Milestone: Tor:
| unspecified
Component: Core Tor/Tor | Version:
Severity: Normal | Resolution:
Keywords: tor-relay, term-project-ideas, | Actual Points:
threads, performance, 035-roadmap-master, 035 |
-triaged-in-20180711 |
Parent ID: | Points: 10
Reviewer: | Sponsor:
-------------------------------------------------+-------------------------
Comment (by schroeder):
There are early plans to distribute crypto operations across multiple
cores, but there might be a better way.
(I emailed before, but I just found the tiny reply link-button)
The ticket states the goal is to saturate the bandwidth available (by
using all the cores as efficiently as possible).
I don't understand why a relay needs to have a "main thread". Network
traffic arrives as an async operation and can be sent back out
asynchronously. So a final strategy shouldn't have a central thread. The
main thread might still be needed for startup, runtime adjustment, and
system upkeep, but not for the core network-crypto processing; that should
never need to touch the main thread.
The current proposal speaks about multi-threading crypto operations, let's
call that "A) Speed - Speeding up processing of a single cell". Instead, I
propose "B) Concurrency - Restructuring so multiple cells can be processed
concurrently".
A cell of data should arrive via IO-Completion thread on a random CPU
core, have crypto transformation applied on the same one core, then be
dispatched onward out via the network. This seems to be quite a simple
approach where I would think crypto code can remain the same "single-
threaded" implementation.
Approach [A] will have diminishing returns as the number of cores
increases. You can only break up a cell unit of work so much until you're
encrypting one byte per cpu core. However, with approach [B], if you have
millions of CPU cores (as an extreme) you can be processing millions of
cells concurrently. Therefore, I believe approach [B] would be more
scalable.
Understood and agreed. I suspected there would be circuit-state to
maintain. As you say, concurrent cells on the same circuit should be
queued or thread-locked. I suspect thread-locking will be simple enough -
the best approach.
Given that it's only a problem for the biggest nodes, a design should be
chosen that is very time-efficient to implement and focuses on achieving
the goals of such users, not focusing on squeezing every drop of
performance, for performance sake. I believe this is that efficient and
focused design.
What do you think?
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/1749#comment:29>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list