[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#41357: 28.0.50; GC may miss to mark calle safe register content
From: |
Andrea Corallo |
Subject: |
bug#41357: 28.0.50; GC may miss to mark calle safe register content |
Date: |
Sun, 17 May 2020 12:42:48 +0000 |
Hi all,
debugging the native compiler I've been chasing a bug in a configuration
where the .eln are compiled at speed 2 (-O2) and emacs-core is compiled
at -O0.
What is going on is that in a .eln in a function A a Lisp_Object is
hold in a register (r14). Function A is calling other functions into
emacs-core till Garbage Collection is triggered.
Being emacs-core compiled with -O0 GCC is not selecting any callee safe
register and therefore these gets never pushed. The value stays in r14
till we enter into 'flush_stack_call_func' where we have to push all
registers and identify the end of the stack for mark.
We correctly push callee safe register with __builtin_unwind_init () and
we identify the top (end) of the stack on my machine using
__builtin_frame_address (0).
Here I think raise the issue, __builtin_frame_address on GCC 7 and 10
for X86_64 is returning the base pointer and not the stack pointer [1].
As a consequence this is not including the callee safe registers that we
have just pushed.
In my case r14 gets pushed at address 0x7ffc47b95fa0 but in mark_stack
we are scanning the interval 0x7ffc47b95fb0 (end) 0x7ffc47b9a150
(bottom). This because __builtin_frame_address returned ebp
(0x7ffc47b95fb0 in this case).
The consequence is that the object originally referenced by r14 is never
marked and this leads to have it freed and to a crash.
I think we would be interested into obtaining the stack pointer and not
the base pointer, unfortunately what __builtin_frame_address does is
appears not really portable:
https://gcc.gnu.org/onlinedocs/gcc/Return-Address.html
This bug is easy to observe in the native compiler with configurations
like this (speed2 for eln -O0 for core) but I believe can affect stock
Emacs too if any caller of flush_stack_call_func has a callee safe
register holding a reference to a live object not present into the
stack. This can get trickier especially with LTO enabled.
For now I'm testing the simple attached patch that seams to do the job
for me. It pushes the registers in 'flush_stack_call_func' and then
call 'flush_stack_call_func1' where now ebp must include the address
where those register got pushed.
I hope I'm not catastrophically wrong in this analysis, in case
I apologize for the noise.
Thanks
Andrea
[1] Reduced example. GCC7 -O0
void *
foo (void)
{
__builtin_unwind_init ();
return __builtin_frame_address (0);
}
foo:
push rbp
mov rbp, rsp
push r15
push r14
push r13
push r12
push rbx
mov rax, rbp
pop rbx
pop r12
pop r13
pop r14
pop r15
pop rbp
ret
0001-Fix-Garbage-Collector-for-missing-calle-safe-registe.patch
Description: Text Data
- bug#41357: 28.0.50; GC may miss to mark calle safe register content,
Andrea Corallo <=
- bug#41357: 28.0.50; GC may miss to mark calle safe register content, Eli Zaretskii, 2020/05/17
- bug#41357: 28.0.50; GC may miss to mark calle safe register content, Andrea Corallo, 2020/05/17
- bug#41357: 28.0.50; GC may miss to mark calle safe register content, Paul Eggert, 2020/05/17
- bug#41357: 28.0.50; GC may miss to mark calle safe register content, Pip Cet, 2020/05/17
- bug#41357: 28.0.50; GC may miss to mark calle safe register content, Paul Eggert, 2020/05/17
- bug#41357: 28.0.50; GC may miss to mark calle safe register content, Eli Zaretskii, 2020/05/17
- bug#41357: 28.0.50; GC may miss to mark calle safe register content, Andrea Corallo, 2020/05/17
- bug#41357: 28.0.50; GC may miss to mark calle safe register content, Paul Eggert, 2020/05/17
- bug#41357: 28.0.50; GC may miss to mark calle safe register content, Eli Zaretskii, 2020/05/17
- bug#41357: 28.0.50; GC may miss to mark calle safe register content, Andrea Corallo, 2020/05/17