[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Libunwind-devel] Backtrace performance on x86-64 based on Dwarf info
From: |
Milian Wolff |
Subject: |
[Libunwind-devel] Backtrace performance on x86-64 based on Dwarf info |
Date: |
Wed, 28 May 2014 14:40:41 +0200 |
User-agent: |
KMail/4.13.1 (Linux/3.14.4-1-ARCH; KDE/4.13.1; x86_64; ; ) |
Hello list!
tl;dr; At the end of a backtrace, I encounter a IP=0x0 which triggers a very
slow path in libunwind.
Long version:
I'm experimenting with libunwind and its working really well for me so far. My
code can be found in [1] if interested. One issue I have so far is the
performance of it all. I do local unwinding on a x86-64 machine, i.e. no frame
pointers, only Dwarf info around. From libunwind, I just want to extract the
backtrace as a list of instruction pointers, i.e. look at the void trace()
function in [1], starting at loc 87.
When I run my tracer on any bigger application with many allocations in
potentially multiple threads, the performance is bad. Worse than running the
application in Valgrind's massif, even! I did not expect that, assuming
Valgrind's CPU emulation and thread-synchronization would be far slower.
Profiling my tracer with perf shows tons of time spent in
_ULx86_64_dwarf_find_proc_info and below. Esp. the syscalls related to
sigprocmask are a hotspot:
# Samples: 20K of event 'syscalls:sys_enter_rt_sigprocmask'
# Event count (approx.): 20153
#
# Overhead Command Shared Object Symbol
# ........ ........ .................. ................................
#
99.49% test_c libc-2.19.so [.] __sigprocmask
|
--- __sigprocmask
|
|--99.83%-- _ULx86_64_dwarf_find_proc_info
| fetch_proc_info.lto_priv.18
| _ULx86_64_dwarf_find_save_locs
| _ULx86_64_step
| (anonymous namespace)::trace(int) [clone
.constprop.250]
| (anonymous namespace)::Data::handleMalloc(void*,
unsigned long)
| malloc
| main
| __libc_start_main
| _start
| 0x0
--0.17%-- [...]
I then tried to figure out whats going on in libunwind, keep in mind that I
have no experience with the code base whatsoever. I recompiled libunwind with
debug mode enabled and then ran my trace script with UNW_DEBUG_LEVEL=14. I
then see that at the end of every backtrace, I get something like this:
>_ULx86_64_init_local: (cursor=0x7fff0c813870)
>_ULx86_64_step: (cursor=0x7fff0c813870, ip=0x00007f5034e46b2d,
cfa=0x00007fff0c8134b0)
>_ULx86_64_reuse_frame: reuse frame ip=0x7f5034e46b2d cfa=0x7fff0c8134b0
format=0 addr=0x0 offset=+0
>_ULx86_64_step: returning 1
>_ULx86_64_step: (cursor=0x7fff0c813870, ip=0x00007f5034e47c1e,
cfa=0x00007fff0c813ca0)
>_ULx86_64_reuse_frame: reuse frame ip=0x7f5034e47c1e cfa=0x7fff0c813ca0
format=0 addr=0x0 offset=+0
>_ULx86_64_step: returning 1
>_ULx86_64_step: (cursor=0x7fff0c813870, ip=0x00007f5034e484cd,
cfa=0x00007fff0c813d50)
>_ULx86_64_reuse_frame: reuse frame ip=0x7f5034e484cd cfa=0x7fff0c813d50
format=0 addr=0x0 offset=+0
>_ULx86_64_step: returning 1
>_ULx86_64_step: (cursor=0x7fff0c813870, ip=0x000000000040094c,
cfa=0x00007fff0c813d70)
>_ULx86_64_reuse_frame: reuse frame ip=0x40094c cfa=0x7fff0c813d70
format=0 addr=0x0 offset=+0
>_ULx86_64_step: returning 1
>_ULx86_64_step: (cursor=0x7fff0c813870, ip=0x00007f50348a1000,
cfa=0x00007fff0c813d90)
>_ULx86_64_reuse_frame: reuse frame ip=0x7f50348a1000 cfa=0x7fff0c813d90
format=0 addr=0x0 offset=+0
>_ULx86_64_step: returning 1
>_ULx86_64_step: (cursor=0x7fff0c813870, ip=0x0000000000400829,
cfa=0x00007fff0c813e50)
>_ULx86_64_reuse_frame: reuse frame ip=0x400829 cfa=0x7fff0c813e50
format=0 addr=0x0 offset=+0
>_ULx86_64_step: returning 1
>_ULx86_64_step: (cursor=0x7fff0c813870, ip=0x0000000000000000,
cfa=0x00007fff0c813e58)
>_ULx86_64_dwarf_find_proc_info: looking for
IP=0xffffffffffffffff
>_ULx86_64_dwarf_find_proc_info: IP=0xffffffffffffffff not found
>_ULx86_64_step: dwarf_step() failed (ret=-10), trying frame-
chain
>_ULx86_64_step: NULL %rbp loc, returning 0
So, if I'm not misunderstanding that output, it seems as if the very last
frame triggers a call to dwarf_find_proc_info, leading to the performance
issue.
Looking into it, I added debug output to the DWARF_IS_NULL_LOC branch in
Gstep.c's unw_step. That branch is never taken. Yet still, somewhere ip is set
to 0 before.
Indeed, if I add a check like this:
if (!c->dwarf.ip)
{
return 0;
}
to the beginning of unw_step, the excessive calls to sigprocmask etc. pp. are
gone. Cool! But is this a valid approach? Generally, why is there a zero-IP?
The good thing is that I can now check for that value in my trace() function
and break the loop. This will then avoid the following costly call to
unw_step.
[1]:
http://quickgit.kde.org/?p=scratch%2Fmwolff%2Fmallocinfo.git&a=blob&h=46a785aa694c5f38bee56a5d78cba0016fb16fc1&hb=3f36142ac44cd43bac34b1d466c334b4fcb7e728&f=malloctrace.cpp
--
Milian Wolff
address@hidden
http://milianw.de
- [Libunwind-devel] Backtrace performance on x86-64 based on Dwarf info,
Milian Wolff <=