[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#38748: 28.0.50; crash on MacOS 10.15.2

From: Eli Zaretskii
Subject: bug#38748: 28.0.50; crash on MacOS 10.15.2
Date: Fri, 10 Jan 2020 10:27:45 +0200

> From: Pip Cet <address@hidden>
> Date: Fri, 10 Jan 2020 07:32:07 +0000
> Cc: address@hidden, address@hidden, address@hidden, 
>       address@hidden, address@hidden
> > The backtrace shows a very recursive GC, it doesn't show any other
> > function being deeply recursive.  So I'm not sure I understand what
> > tail-recursive function did you have in mind.  Can you elaborate?
> I can. I think we're looking at two bugs: the first is the simple
> use-after-free of XFRAME (frame)->output_data.ns where `frame' is a
> dead frame. I've confirmed on GNU/Linux that mark_frame is called for
> a frame for which x_free_frame_resources has already been called, if
> there's a global variable still referencing the frame. I think the
> same thing happens on macOS.

This one doesn't depend on the 'ok's initialization in
face_inherited_attr in any way, does it?

> 1. I think face_inherited_attr is being optimized to tail-call itself
> rather than calling itself in a new stack frame; thus, it loops
> indefinitely for a faulty face setup which would otherwise lead to an
> immediate crash.
> 1b. that optimization only works without the harmless initialization of "ok".
> 2. Our initial face setup is faulty in the sense above.
> 3. Something happens on a secondary thread which causes our face setup
> to become non-faulty, possibly during GC.

What do you mean by "secondary thread"?  And how can GC modify Lisp
data structures? that'd be a terrible bug.

In any case, the full backtrace shows no trace of face_inherited_attr
call anywhere in the callstack, so if there is indeed infinite
recursion in that function, it was somehow exited long ago by the time
GC runs.

As for the tail-recursion part: do you see any sign of that in the
disassembly posted by Robert?  I didn't, but maybe I missed
something.  And such subtleties should only rear their ugly heads in
optimized code, whereas we already know that an unoptimized build
crashes in the same way.

I still think the shortest way to finding the culprit here is to
patiently and painfully go over the last_marked array, deciphering
the Lisp object we marked, until we succeed in identifying the Lisp
data structure which got corrupted.  Once we succeed in identifying
that data structure, it should be relatively easy to find who and
where corrupts it.  This may mean a lot of inconvenient drudgery,
exacerbated by the fact that having a functional GDB on macOS is not
easy, but I don't think we have a better way at this point.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]