[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#28211: Stack marking issue in multi-threaded code, 2020 edition
From: |
Ludovic Courtès |
Subject: |
bug#28211: Stack marking issue in multi-threaded code, 2020 edition |
Date: |
Thu, 12 Mar 2020 22:59:11 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) |
Hi!
I think I’ve found another race condition involving stack marking, as a
followup to <https://issues.guix.gnu.org/issue/28211> (this time on
3.0.1+, but the code is almost the same.)
‘abort_to_prompt’ does this:
--8<---------------cut here---------------start------------->8---
fp = vp->stack_top - fp_offset;
sp = vp->stack_top - sp_offset;
/* Continuation gets nargs+1 values: the one more is for the cont. */
sp = sp - nargs - 1;
/* Shuffle abort arguments down to the prompt continuation. We have
to be jumping to an older part of the stack. */
if (sp < vp->sp)
abort ();
sp[nargs].as_scm = cont;
while (nargs--)
sp[nargs] = vp->sp[nargs];
/* Restore VM regs */
vp->fp = fp;
vp->sp = sp;
vp->ip = vra;
--8<---------------cut here---------------end--------------->8---
What if ‘scm_i_vm_mark_stack’ walks the stack right before the ‘vp->fp’
assignment? It can determine that one of the just-assigned ‘sp[nargs]’
is a dead slot, and thus set it to SCM_UNSPECIFIED. Later, when we set
‘vp->fp’, that stack slot that we just initialized has been overwritten
by ‘scm_i_vm_mark_stack’. Down the road, we get something like:
Wrong type to apply: #<unspecified>
I believe this is what I’m seeing here (0x7ff7f838dda0 is being set to
SCM_UNSPECIFIED while thread 2 is in ‘abort_to_prompt’):
--8<---------------cut here---------------start------------->8---
(rr) thread 5
[Switching to thread 5 (Thread 24572.24575)]
#0 scm_i_vm_mark_stack (vp=0x7ff7fd820b48, mark_stack_ptr=0x7ff7fc0ebf90,
mark_stack_limit=0x7ff7fc0fbec0) at vm.c:743
743 break;
(rr) list
738 break;
739 case SLOT_DESC_DEAD:
740 /* This value may become dead as a result of GC,
741 so we can't just leave it on the stack. */
742 sp->as_scm = SCM_UNSPECIFIED;
743 break;
744 }
745 }
746 sp = SCM_FRAME_PREVIOUS_SP (fp);
747 /* Inner frames may have a dead slots map for precise marking.
(rr) p sp->as_scm
$59 = #<unspecified>
(rr) p sp
$60 = (union scm_vm_stack_element *) 0x7ff7f838dda0
(rr) thread 2
[Switching to thread 2 (Thread 24572.24577)]
#0 0x00007ff7fdb7bb36 in __GI___sigsuspend (
set=set@entry=0x7ff7fe132720 <suspend_handler_mask>)
at ../sysdeps/unix/sysv/linux/sigsuspend.c:26
26 ../sysdeps/unix/sysv/linux/sigsuspend.c: Dosiero aŭ dosierujo ne
ekzistas.
(rr) frame 4
#4 0x00007ff7fe228f14 in abort_to_prompt (thread=0x7ff7fd820b40,
saved_mra=<optimized out>) at vm.c:1465
1465 sp[nargs] = vp->sp[nargs];
(rr) p sp
$61 = (union scm_vm_stack_element *) 0x7ff7f838dd90
(rr) p fp
$62 = (union scm_vm_stack_element *) 0x7ff7f838ddb0
(rr) p &sp[2]
$63 = (union scm_vm_stack_element *) 0x7ff7f838dda0
(rr) p vp->sp
$64 = (union scm_vm_stack_element *) 0x7ff7f838dcf0
(rr) p vp->fp
$65 = (union scm_vm_stack_element *) 0x7ff7f838dd08
(rr) p vp->stack_bottom
$66 = (union scm_vm_stack_element *) 0x7ff7f838a000
(rr) p vp->stack_top
$67 = (union scm_vm_stack_element *) 0x7ff7f838e000
--8<---------------cut here---------------end--------------->8---
Comments about this analysis?
How do we fix it? It’s a bit troubling that this is all lock-free. A
fix I can think of is to just re-do the sp[nargs] assignments after the
vp->sp etc. assignments.
Thoughts?
Ludo’.
- bug#28211: Stack marking issue in multi-threaded code, 2020 edition,
Ludovic Courtès <=