tor callgrinds
Watson Ladd
watsonbladd at gmail.com
Sat Feb 17 21:41:25 UTC 2007
Nick Mathewson wrote:
> On Fri, Feb 16, 2007 at 05:35:50PM -0800, Christopher Layne wrote:
>> On Fri, Feb 16, 2007 at 02:00:00PM -0800, Christopher Layne wrote:
>>> Thought you guys might find this interesting. I did a couple of callgrind
>>> runs on 2 different tor builds, 1 using -Os and the other using -O3. The
>> So did a bit more research on spec'ing which cost models are default in
>> callgrind and now have it logging jumps, asm instructions, and l1/l2/dram
>> performance counters in the simulator. If anyone is interested on the
>> machine specifically it's a 2.1 ghz Celeron-D (Prescott) running under
>> Linux 2.6.20. I've rebuilt openssl, libz, and libevent with cranked up
>> optimization/debug on, so more interesting things to look at.
>
> Hi, Chris! This is pretty neat stuff! If you can do more of this, it
> could help the development team know how to improve speed.
>
> (Sorry about the delay in answering; compiling kcachegrind took me way
> longer than it should have.)
>
> A few questions.
>
> 1. What version of Tor is this? Performance data on 0.1.2.7-alpha
> or on svn trunk would help a lot more than data for 0.1.1.x,
> which I think this is. (I think this is the 0.1.1.x series
> because all the compression seems to be happening in
> tor_gzip_compress, whereas 0.1.2.x does compression
> incrementally in tor_zlib_process.) There's already a lot of
> performance improvements (I think) in 0.1.2.7-alpha, but there
> might be possible regressions too, and I'd like to catch them
> before we release... whereas it is not likely that we'll do
> anything besides security and stability to 0.1.1.x, since it's
> supposed to be a stable series.
>
> 2. How is this server configured? A complete torrc would help.
>
> 3. To what extent does -O3 help over -O2? Most users seem to
> compile with -O2, so we should probably change our flags if the
> difference is nontrivial.
>
> 4. Supposedly, KCachegrind can also visualize oprofile output. If
> this is true, and you could get it working, it might give more
> accurate information as to actual timing patterns, with fewer
> Heisenberg effects. (Even raw oprofile output
> would help, actually.)
>
> Now, some notes on the actual data. Again, I'm guessing this is for
> Tor 0.1.1.x, so some of the results could be quite different for the
> development series, especially if we fixed some stuff (which I think
> we did) and especially if we introduced some stupid stuff (which
> happens more than I'd like).
>
> * It looks like most of our time is being spent, as an OR and
> directory server, in compression, AES, and RSA. To improve
> speed, our options are basically "make it faster" or "do it
> less" for each of these.
>
> * AES isn't going to get used much less: A relay server still
> needs to AES-ctr-crypt each cell it gets three times: once for
> TLS for link secrecy on the inbound link, once with a circuit
> key for long-range secrecy, and once for TLS for link security
> on the outbound link. This explains the pretty even breakdown
> between rijndaelEncrypt, _X86_AES_decrypt, and _X86_AES_encrypt
> in the results. (If you're not following me, read the design
> paper, or just trust me. ;) )
>
> [We could _maybe_ save the middle
> encryption in some cases by a trick similar to what we use for
> CREATE_FAST cells, but it would only get rid of 1/8 of the AES
> done by servers in toto, thus reducing the average server's A]
>
> * Making AES faster would be pretty neat; the right way to go
> about it is probably to look hard at how OpenSSL is doing it,
> and see whether it can't be improved. Then again, the OpenSSL
> team is pretty clever, and it's not likely that there is a lot
> of low-hanging fruit to exploit here.
>
> * So here's how RSA is getting used on my server right now:
>
> 0 directory objects signed,
> 1643 directory objects verified,
> 8 routerdescs signed,
> 20554 routerdescs verified,
> 38 onionskins encrypted,
> 37631 onionskins decrypted,
> 35148 client-side TLS handshakes,
> 29866 server-side TLS handshakes,
> 0 rendezvous client operations,
> 70 rendezvous middle operations,
> 0 rendezvous server operations.
>
> So it looks like verifying routers, decrypting onionskins, and
> doing TLS handshakes are the big offenders for RSA. We've
> already cut down onionskin decryption as much as we can except
> by having clients build circuits less often. To cut down on
> routerdesc verification, we need to have routers upload their
> descriptors and have authorities replace descriptors less often,
> and there's already a lot of work in that direction, but I don't
> know if I've seen any numbers recently. We could cut down on
> TLS handshakes by using sessions, but that could hurt forward
> secrecy badly if we did it in a naive way. (We could be smarter
> and use sessions with a very short expiration window, but it's
> not clear whether that would actually help: somebody would need
> to find out how frequent TLS disconnect/reconnects are in
> comparison to ).
We also could eliminate the indirection in the TLS handshakes. Currently
the OR's make a temporary cert which they sign with a long-term one.
Verifying this is a pain, but OR's don't notice. We could also use a
more efficient algorithm then we do now for the authentication of the
client to the OP.
>
> * Making RSA faster could also be fun for somebody. The core
> multiplication functions in openssl (bn_mul_add_words and
> bn_sq_comba8) are already in assembly, but it's conceivable that
> somebody could squeeze a little more out of them, especially on
> newer platforms. (Again, though, this is an area that smart
> people have already spent a lot of time in.)
>
> * Finally, compression. Zlib is pretty tunable in how it makes
> the CPU/compression tradeoff, so it wouldn't be so hard to
> fine-tune the compression algorithm more thoroughly. Every
> admin I've asked, though, has said that they'd rather spend CPU
> to save bandwidth than vice versa. Another way to do less
> compression would be to make directory objects smaller and have
> them get fetched less often: there are some design proposals to
> do that in the next series, and I hope that people help beat
> them into some semblance of workability.
>
> Again, many thanks for this information; I hope we'll see more like it
> in the future!
>
> peace,
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 249 bytes
Desc: OpenPGP digital signature
URL: <http://lists.torproject.org/pipermail/tor-dev/attachments/20070217/24397910/attachment.pgp>
More information about the tor-dev
mailing list