[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Libunwind-devel] 10% lost unwind traces on x86-64?
From: |
Lassi Tuura |
Subject: |
Re: [Libunwind-devel] 10% lost unwind traces on x86-64? |
Date: |
Tue, 9 Mar 2010 19:16:36 +0100 |
Hi,
Thanks Arun.
>> - Suspiciously large fraction of failures occur at (function+0), i.e. at
>> function entry address.
>
> This has been discussed before:
>
> http://thread.gmane.org/gmane.comp.lib.unwind.devel/284/focus=296
>
> There is a patch in that thread that might be useful for solving this.
Thanks, I'll try to digest that :-)
> For the remaining problems, I'd suggest:
>
> * Trying a new libc
>
> If the problem goes away, someone added missing unwind info.
I'll try on other systems, but I am afraid we're stuck with RHEL5 for now. If
in the end this is the only fix, I'll put that forward, but I don't really
expect they'd bite. At best I imagine we might get a handful of custom boxes
for profiling work, but it would be a real pain from user support point of view.
> * Examining readelf -wf for the code in question
>
> This is a manual step. If you can prove that the compiler modified the
> stack pointer and forgot to generate unwind info, try testing a more
> recent compiler.
Yes, this is exactly what I was doing. I had GDB attached to the program, and
whenever my program detected anomalous stack trace I made full libunwind stack
dump, had GDB dump the same stack trace, and then manually inspected each of
the anomalies: the address, the disassembly and readelf unwind dumps.
The result was the five categories. I have not come across anything else yet in
the hundreds of these I investigated. If you want the gory details I can post
them.
> * Signal frames libunwind doesn't understand
>
> I haven't seen weird calling conventions in practice yet. But signals
> are of two types:
>
> * IP points to the instruction *after* the one that triggered the signal
> * IP points to the instruction *before* the one that triggered the signal
>
> libunwind doesn't distinguish between the two yet.
Right, I think mine (SIGPROF) is of the first kind, i.e. saved %rip is where
the execution will resume. What does libunwind assume for the "ip" of the dwarf
cursor? Is it the instruction to be executed next, or already executed? Maybe I
can knock up a patch for this.
I've not had problems getting through the signal frame.
Regards,
Lassi