[tor-bugs] #1749 [Core Tor/Tor]: Split relay and link crypto across multiple CPU cores

Fri Jan 11 02:19:36 UTC 2019

#1749: Split relay and link crypto across multiple CPU cores
-------------------------------------------------+-------------------------
 Reporter:  nickm                                |          Owner:
                                                 |  chelseakomlo
     Type:  project                              |         Status:
                                                 |  assigned
 Priority:  High                                 |      Milestone:  Tor:
                                                 |  unspecified
Component:  Core Tor/Tor                         |        Version:
 Severity:  Normal                               |     Resolution:
 Keywords:  tor-relay, term-project-ideas,       |  Actual Points:
  threads, performance, 035-roadmap-master, 035  |
  -triaged-in-20180711                           |
Parent ID:                                       |         Points:  10
 Reviewer:                                       |        Sponsor:
-------------------------------------------------+-------------------------

Comment (by schroeder):

 There are early plans to distribute crypto operations across multiple
 cores, but there might be a better way.

 (I emailed before, but I just found the tiny reply link-button)

 The ticket states the goal is to saturate the bandwidth available (by
 using all the cores as efficiently as possible).

 I don't understand why a relay needs to have a "main thread". Network
 traffic arrives as an async operation and can be sent back out
 asynchronously. So a final strategy shouldn't have a central thread. The
 main thread might still be needed for startup, runtime adjustment, and
 system upkeep, but not for the core network-crypto processing; that should
 never need to touch the main thread.

 The current proposal speaks about multi-threading crypto operations, let's
 call that "A) Speed - Speeding up processing of a single cell". Instead, I
 propose "B) Concurrency - Restructuring so multiple cells can be processed
 concurrently".

 A cell of data should arrive via IO-Completion thread on a random CPU
 core, have crypto transformation applied on the same one core, then be
 dispatched onward out via the network. This seems to be quite a simple
 approach where I would think crypto code can remain the same "single-
 threaded" implementation.

 Approach [A] will have diminishing returns as the number of cores
 increases. You can only break up a cell unit of work so much until you're
 encrypting one byte per cpu core. However, with approach [B], if you have
 millions of CPU cores (as an extreme) you can be processing millions of
 cells concurrently. Therefore, I believe approach [B] would be more
 scalable.

 Understood and agreed. I suspected there would be circuit-state to
 maintain. As you say, concurrent cells on the same circuit should be
 queued or thread-locked. I suspect thread-locking will be simple enough -
 the best approach.

 Given that it's only a problem for the biggest nodes, a design should be
 chosen that is very time-efficient to implement and focuses on achieving
 the goals of such users, not focusing on squeezing every drop of
 performance, for performance sake. I believe this is that efficient and
 focused design.

 What do you think?

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/1749#comment:29>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online