bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#41357: 28.0.50; GC may miss to mark calle safe register content


From: Andrea Corallo
Subject: bug#41357: 28.0.50; GC may miss to mark calle safe register content
Date: Sun, 17 May 2020 12:42:48 +0000

Hi all,

debugging the native compiler I've been chasing a bug in a configuration
where the .eln are compiled at speed 2 (-O2) and emacs-core is compiled
at -O0.

What is going on is that in a .eln in a function A a Lisp_Object is
hold in a register (r14).  Function A is calling other functions into
emacs-core till Garbage Collection is triggered.

Being emacs-core compiled with -O0 GCC is not selecting any callee safe
register and therefore these gets never pushed.  The value stays in r14
till we enter into 'flush_stack_call_func' where we have to push all
registers and identify the end of the stack for mark.

We correctly push callee safe register with __builtin_unwind_init () and
we identify the top (end) of the stack on my machine using
__builtin_frame_address (0).

Here I think raise the issue, __builtin_frame_address on GCC 7 and 10
for X86_64 is returning the base pointer and not the stack pointer [1].
As a consequence this is not including the callee safe registers that we
have just pushed.

In my case r14 gets pushed at address 0x7ffc47b95fa0 but in mark_stack
we are scanning the interval 0x7ffc47b95fb0 (end) 0x7ffc47b9a150
(bottom).  This because __builtin_frame_address returned ebp
(0x7ffc47b95fb0 in this case).

The consequence is that the object originally referenced by r14 is never
marked and this leads to have it freed and to a crash.

I think we would be interested into obtaining the stack pointer and not
the base pointer, unfortunately what __builtin_frame_address does is
appears not really portable:

https://gcc.gnu.org/onlinedocs/gcc/Return-Address.html

This bug is easy to observe in the native compiler with configurations
like this (speed2 for eln -O0 for core) but I believe can affect stock
Emacs too if any caller of flush_stack_call_func has a callee safe
register holding a reference to a live object not present into the
stack.  This can get trickier especially with LTO enabled.

For now I'm testing the simple attached patch that seams to do the job
for me.  It pushes the registers in 'flush_stack_call_func' and then
call 'flush_stack_call_func1' where now ebp must include the address
where those register got pushed.

I hope I'm not catastrophically wrong in this analysis, in case
I apologize for the noise.

Thanks

  Andrea

[1] Reduced example. GCC7 -O0

void *
foo (void)
{
  __builtin_unwind_init ();
  return __builtin_frame_address (0);
}

foo:
        push    rbp
        mov     rbp, rsp
        push    r15
        push    r14
        push    r13
        push    r12
        push    rbx
        mov     rax, rbp
        pop     rbx
        pop     r12
        pop     r13
        pop     r14
        pop     r15
        pop     rbp
        ret

Attachment: 0001-Fix-Garbage-Collector-for-missing-calle-safe-registe.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]