hardware acceleration available for Tor ? On FreeBSD ?

coderman coderman at gmail.com
Sat Oct 17 10:26:05 UTC 2009


On Mon, Oct 12, 2009 at 8:24 PM, John Case <case at sdf.lonestar.org> wrote:
> ...
> You discuss the performance benefits of using the AES CTR in hardware below,
> but before I get to that, I wonder if you experienced the same as what is
> quoted just above:  that the real CPU load occurs during "the asymmetric-key
> processing (i.e., onion-skin encrypting/decrypting)" and that that is the
> only area where we can attempt to speed things up with hardware ?

speeding up OpenSSL derived CTR mode with hardware acceleration is
useful. (the HardwareAccel and EngineName options above)

speeding up Tor with native hardware acceleration and chained /
repetitive CTR offload is a much bigger win. as an example the VIA
Padlock engine can do a few score MBytes/sec aes 128 with small/single
block sizes like those used when HardwareAccel is on and EngineName
padlock set. if you could utilize the maximum 16KByte rep instruction
mode directly goes well over a gigabyte / second. this is also why the
native Solaris API CTR mode gives much better performance than the
OpenSSL engine acceleration.

speeding up OpenSSL pubkey ops also helps. as of OpenSSL 0.9.9 and
1.0.x you can build OpenSSL with padlock MONTMULT acceleration. still
no dynamic engine support for pubkey ops iirc...


> So, if I understand correctly, the HardwareAccel option can be turned on by
> anyone, regardless of OS or hardware platform.  It will then (probably) just
> do AES CTR with OpenSSL (since most people use OpenSSL) and just get no
> benefit because OpenSSL won't do AES CTR through hardware.

there will be some benefit as Tor wraps a CTR mode around
(accelerated) OpenSSL ECB mode. the problem is that this method is
limited to single blocks per call, rather than long buffers optimized
for crypto offload.

you may also get SHA acceleration. the notices log should say, for example:
[info] crypto_global_init(): Initializing OpenSSL engine support.
[info] crypto_global_init(): Initializing dynamic OpenSSL engine
"padlock" acceleration support.
[info] crypto_global_init(): Loaded dynamic OpenSSL engine "padlock".
[info] crypto_global_init(): Loaded OpenSSL hardware acceleration
engine, setting default ciphers.
[info] Using default implementation for RSA
[info] Using default implementation for DH
[info] Using default implementation for RAND
[notice] Using OpenSSL engine VIA PadLock: RNG (not used) ACE2
PHE(8192) PMM  [padlock] for SHA1
[info] Using default implementation for 3DES
[notice] Using OpenSSL engine VIA PadLock: RNG (not used) ACE2
PHE(8192) PMM  [padlock] for AES


> So, even if I got a BCM5825 working with with the ubsec driver on FreeBSD,
> and used HardwareAccel, it would just be wasted with OpenSSL.

not necessarily, per above :)


> On the other hand, there are Solaris-specific routines (crypto framework
> APIs (PKCS#11)) built into Solaris that Tor can call instead of OpenSSL,
> which _will_ do AES CTR in hardware, yielding a huge gain in performance
> (you mention 25x).
>
> Do I have all of that correct ?

it would also be nice to have these routines native for other engines,
like padlock. however this is not a small amount of effort and no one
with time and skill has shown an interest yet. (Tor buffer allocation
alignment to 16K, inline padlock instr. calls in REP mode, and other
changes required.)


> - Are there analogues to the "crypto framework APIs (PKCS#11)" in other OS's
> ?  What is this layer called for Linux, for instance ?

there are kernel cryptographic facilities including asm optimized and
hardware accelerated crypto primitives. these are usually not utilized
from userspace directly. there are PKCS specs for communicating with
offload devices in a concise manner, and libs of various types to
speak this to myraid hardware. perhaps a longer discussion should be
taken off list...


> - How does the T2 (Niagara 2) compare to dedicated hardware such as the Sun
> Crypto 6000 which is currently available ?

the T2 is core acceleration and thus higher throughput for most
purposes. the 6000 is useful as a hardened private key store service
for your less trusted OS and software to cooperate with.


> - Am I correct that if a new version of OpenSSL appeared with AES CTR
> hardware support, an end user could just proceed blindly with card
> insertion, driver install, add HardwareAccel=1, and poof!  Yes ?

Tor would need to be updated to take advantage of the new CTR mode in
the OpenSSL API calls, but then the same HardwareAccel (and AccelName)
options would provide a significant improvement over the previous
acceleration mode.

best regards,
***********************************************************************
To unsubscribe, send an e-mail to majordomo at torproject.org with
unsubscribe or-talk    in the body. http://archives.seul.org/or/talk/



More information about the tor-talk mailing list