AES performance results
Nick Mathewson
nickm at freehaven.net
Tue Feb 27 00:06:57 UTC 2007
Because a well-behaved Tor spends most of its time in AES, and because
our last AES benchmarks gave surprising results (basically, "OpenSSL
0.9.7 AES isn't very fast"), I thought it would be a good idea to
benchmark again before the next 0.1.2.x stable release.
SUMMARY: I found that OpenSSL 0.9.8[be] AES is uniformly faster than
OpenSSL 0.9.7[bf] AES. I found also that on the x86 hardware I have,
0.9.8e's AES implementation is significantly faster than our current
implementation, whereas our current implementation seems to be
slightly faster on PPC. I also found that -O3 helps a little
everywhere, and a lot on in some places. That's the next thing I'll
look into.
METHODOLOGY: I wrote a stupid benchmark function in aes.c to encrypt a
million cell-sized chunks using our aes_crypt function, and timed it
with the unix "time" command. I did this twice for each
(computer,code) pair, I took the median of three runs.
Hardware, openssl version, and gcc versions are as noted. Everything
was build with -O2 except as noted.
The optimizations considered were as follows:
{Not using OpenSSL}
builtin:
Use the reference "fast" copy of rijndaelEncrypt from
rijndael-alg-fst.c version 3.
use_rijndael_counter_optimization:
As "builtin", but skip an encode/decode step when filling the
AES buffer. (AES considers a 128-bit block as 4 32-bit
integers; counter mode begins by encoding a 128-bit integer
into a 128-bit block.)
[This is what Tor does now.]
<full unroll>
As use_rijndael_counter_optimization, but also define the
FULL_UNROLL macro in order to enable some loop unrolling.
{Using OpenSSL}
use_openssl_evp:
Define the USE_OPENSSL_EVP macro in Tor's aes.c so that all
crypto is handled by OpenSSL's EVP_EncryptUpdate() function.
use_openssl_aes:
Define the USE_OPENSSL_AES macro in Tor's aes.c so that all
crypto is handled by OpenSSL's AES_encrypt() function.
Results:
On Catbus, an Intel Core 2 Duo E6700, openssl 0.9.8b, gcc 4.1
builtin: 7.4s
use_rijndael_counter_optimization: 7.3s
+ <full unroll>: 6.8
+ <full unroll, -O3>: 6.2
use_openssl_evp: 5.3s
use_openssl_aes: 4.6s
+ <-O3>: 4.4s
On Totoro, an Athlon XP 1700+ with openssl 0.9.7f, gcc 4.0
builtin: 17.5
use_rijndael_counter_optimization: 17.3s
+ <-O3>: 17.3
+ <full unroll>: 18.6
+ <full unroll, -O3>: 18.2
use_openssl_evp: 23.0
use_openssl_aes: 21.2
+ <-O3>: 20.2
use_openssl_aes, with 0.9.8e: 10.9
+ <-O3>: 10.0
On Kushana, 1.33 GHz G4 with openssl 0.9.7b, gcc 4.0
builtin: 11.9
use_rijndael_counter_optimization: 11.1
+ <full unroll>: 10.7
+ <full unroll, -O3>: 10.7
use_openssl_evp: 17.2
use_openssl_aes: 13.3
+ <-O3>: 12.9
use_openssl_aes, with 0.9.8e: 12.0
+ <-O3>: 11.6
CONCLUSIONS:
In the face of OpenSSL 0.9.7f or earlier, it is a good idea to
continue with our current approach. FULL_UNROLL help some places,
but not others. -O3 helps a little. Our current approach does
around 15% better than the fastest OpenSSL-0.9.7f-based approach.
With OpenSSL 0.9.8b or later, on x86 platforms, it is a big win to
use OpenSSL's AES_encrypt; it is about 37% faster than what we're
doing now. Using -O3 helps a little.
On PPC G4, our current approach is still faster than OpenSSL, but
only by about 8% as opposed to 16% with OpenSSL 0.9.7. FULL_UNROLL
is a good idea here.
So, the code should basically do
#if (recent openssl && (x86 || x86_64))
# define USE_OPENSSL_AES
#else if (PPC)
# define USE_RIJNDAEL_COUNTER_IMPLEMENTATION
# define FULL_UNROLL <--maybe
#else
# define USE_RIJNDAEL_COUNTER_IMPLEMENTATION
#endif
Depending on what profiling method and what workload you use, we
spend between 8% and 20% of our time in aes_crypt; if these results
hold in the field, taking this approach will save us between 5 and
12% of our CPU time. Not bad.
THANKS:
To Ben Laurie for confirming that I'm not nuts here.
To Andy Polyakov, whom Ben tells me is to thank for OpenSSL's asm
AES implementations.
And To the people who've been writing profiling-related mail to the
list.
peace,
--
Nick Mathewson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 652 bytes
Desc: not available
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20070226/2c629ffb/attachment.pgp>
More information about the tor-dev
mailing list