[tor-bugs] #9299 [Tor]: Dump stack traces on assertion, crash, or general trouble
Tor Bug Tracker & Wiki
blackhole at torproject.org
Mon Oct 14 16:09:25 UTC 2013
#9299: Dump stack traces on assertion, crash, or general trouble
-----------------------------+---------------------------------
Reporter: nickm | Owner:
Type: enhancement | Status: needs_review
Priority: normal | Milestone: Tor: 0.2.5.x-final
Component: Tor | Version:
Resolution: | Keywords: tor-relay debugging
Actual Points: | Parent ID:
Points: |
-----------------------------+---------------------------------
Comment (by nickm):
Replying to [comment:5 lunar]:
> Ok, here's a review. I hope I'm not wasting anyone's time.
>
> Interesting how the new `format_dec_number_sigsafe()` and
`format_hex_number_sigsafe()` are slightly different given they do almost
the same thing.
Yeah; perhaps they should share more code.
> {{{
> +++ b/configure.ac
> crt_externs.h \
> + execinfo.h \
> grp.h \
> }}}
>
> Small whitespace issue.
Fixed
> {{{
> + tor_assert(0 * 0 == 1);
> }}}
>
> Leaving that in the history is probably bad for future bisection, no?
That's what the later "fixup!" in the commit message for
62645b5e18a8994c9257078f571d4227037a894c is for. See the "--autosquash"
argument for git rebase.
> `tor_log_err_sigsafe_helper()` might be better named
`tor_log_err_sigsafe_write()`
Renaming.
> {{{
> + if (log_time_granularity >= 2000) {
> + int g = log_time_granularity / 1000;
> }}}
>
> A comment for the magic numbers? You previously had 15 minutes. Does
this mean that the
> `stack_dump` file will have second granularity by default? Is that a
problem?
It means that the file will have the same granularity as the rest of the
system logs.
> {{{
> +tor_log_get_sigsafe_err_fds(const int **out)
> +{
> + *out = sigsafe_log_fds;
> + return n_sigsafe_log_fds;
> }}}
>
> Can't we have a race here if `tor_log_get_sigsafe_err_fds()` is called
and then `tor_log_update_sigsafe_err_fds()`? Or do the usage of the first
mandate `LOCK_LOGS`? Or are we sure there's no way this can happen?
Hm. During a signal handler, it's not safe to grab a lock, so we might
need to tolerate that, or use some kind of atomic set/fetch.
> `fd7aa1af` commit message does not say anything about the
`found_real_stderr` thing.
Added in a squash! commit.
> {{{
> + assert(n_sigsafe_log_fds >= 2);
> }}}
>
> Using `assert()` because `tor_assert()` will not be happy with an
unconfigured crash handler, right? Might worth a comment.
Added a comment. The other reason is that tor_assert logs, and logging
inside log functions is potentially a recursive nightmare.
> {{{
> +#ifdef PC_FROM_UCONTEXT
> +#if defined(__linux__)
> + const int n = 1;
> +#elif defined(__darwin__) || defined(__APPLE__) || defined(__OpenBSD__)
\
> + || defined(__FreeBSD__)
> + const int n = 2;
> +#endif
> }}}
>
> Maybe add an #else #error here ? The compiler should be unhappy because
`n` will be uninitialized, but it might save a little bit of time for a
future port.
Added
> {{{
> + puts("Argument should be \"assert\" or \"crash\"");
> }}}
>
> “or none”
Added
> {{{
> + return crash(x) + crash(x*2);
> }}}
>
> What's the point of testing the reader's knowledge about C evaluation
order? :)
Added a comment. The actual point is to prevent the compiler from adding
a tail-call optimization during the first call to crash().
>
>
> Running the test program with assert has the logging function in the
trace:
>
> {{{
> $ ./test-bt-cl assert
> Oct 11 22:55:39.705 [err] tor_assertion_failed_(): Bug:
src/test/test_bt_cl.c:36: crash: Assertion 1 == 0 failed; aborting.
> Oct 11 22:55:39.705 [err] Bug: Assertion 1 == 0 failed in crash at
src/test/test_bt_cl.c:36. Stack trace:
> Oct 11 22:55:39.705 [err] Bug: ./test-bt-cl(log_backtrace+0x35)
[0x7f3236a5e095]
> Oct 11 22:55:39.705 [err] Bug: ./test-bt-
cl(tor_assertion_failed_+0x9f) [0x7f3236a68a8f]
> Oct 11 22:55:39.705 [err] Bug: ./test-bt-cl(crash+0x79)
[0x7f3236a5de69]
> […]
> }}}
>
> Is that something we want to keep?
I'd say, "yes".
> For the other one:
>
> {{{
> $ ./test-bt-cl crash
> ============================================================
T=1381524943
> Tor died: Caught signal 11
> ./test-bt-cl(+0x8fc7)[0x7f156871afc7]
> ./test-bt-cl(crash+0x48)[0x7f156871ae38]
> ./test-bt-cl(crash+0x48)[0x7f156871ae38]
> ./test-bt-cl(oh_what+0x25)[0x7f156871ae95]
> }}}
>
> There's no symbol for the first function and `crash()` is there twice.
Is that expected?
Odd. This seems Linux-specific. Will investigate.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/9299#comment:6>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
More information about the tor-bugs
mailing list