[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland
From: |
Pip Cet |
Subject: |
Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland |
Date: |
Sat, 07 Sep 2024 09:05:46 +0000 |
0"Eli Zaretskii" <eliz@gnu.org> writes:
>> Date: Fri, 06 Sep 2024 19:29:28 +0000
>> From: Pip Cet <pipcet@protonmail.com>
>> Cc: Eli Zaretskii <eliz@gnu.org>, gerd.moellmann@gmail.com,
>> emacs-devel@gnu.org
>>
>> So we can decode those to three interleaved lists reading, in part:
>>
>> (nil font-lock-face (:foreground ...))
>> (rear-nonsticky t <bad symbol> ...)
>> (nil font-lock-face (...))
>>
>> <bad symbol> is a pointer to what looks like the nursery generation, but
>> one which we must have failed to trace (presumably the symbol was either
>> uninterned and freed or interned and moved to an older generation) and
>> which was subsequently reused for cons cells by composite.c
>>
>> Going back to the original report, I notice that it was trying to print
>> an "error in process filter: " message while handling what looks like a
>> (long) sequence of terminal escape codes. Were you using M-x term at
>> the time? Did you notice such error messages?
>>
>> I'll have another look at the process filter/longjmp code, but I suspect
>> we're going to have to wait for further crashes to get to the bottom of
>> this.
>
> What data is missing to get to the bottom of this, and how can we
> change the code and/or add some .gdbinit magic to provide that data?
I don't think .gdbinit magic would work.
The main problem is that while MPS GC should happen more frequently than
traditional GC, it's still unlikely to crash near the code that failed
to trace objects. We got lucky there a few times, but it looks like our
luck ran out here.
So a first change would be an option for very eager garbage collection;
I'd already proposed a patch to do so on a separate OS thread, but it
would be better to do so on the main thread, to avoid false positives
when main thread code deliberately leaves things in an inconsistent
state while assuming GC doesn't happen.
> In general, our current facilities to investigate igc-related crashes
> are clearly insufficient.
I agree.
> The old GC has the last_marked[] array, which could be used to trace
> back any bad values which caused a GC-related crash, and I used that
> on several occasions.
To be honest, I don't even know whether MPS uses depth-first marking
(which would make the last_marked[] array useful).
> But there's nothing similar in igc.c, which
> makes the investigation basically a guesswork. How can we improve
> this situation? I expect this kind of trouble to happen a lot in the
> near future, so having efficient tools for debugging is crucial, IMO.
Just off the top of my head, here are a few ideas:
1. make garbage collection much more eager. Easy to do, high
performance cost, provides slightly better traces. In particular,
always perform a full GC after returning from a non-local exit, which
invalidates many ambiguous references at once.
2. a last_marked[] array. Should be cheap to do if it's fixed size, but
may not help very much if the order MPS traces objects in has poor
locality.
3. Use the "extended header" (which already exists) to save a backtrace
for the function which allocated an object. This will increase memory
usage for the whole of Emacs a lot. I believe, in most cases, this is
the information we need: something allocates an object and stores a
reference to it in memory that's invisible to MPS, so it's not fixed
when the object moves. Then it retrieves the reference (which now
points to random memory in the arena) and it's traced in the next GC,
but causes a crash.
4. Save a log of what moved where. This would allow us, in this case,
to at least find out what the <bad symbol> above was, I think.
5. Provide a facility which repeatedly walks all pools and ends up
returning a (shortest) path of references which keep an object alive.
This works well for other languages, but Lisp tends to have very long
paths because of the cons cell linkage. I'm not sure how difficult this
would be to implement.
6. Anything that involves modifying MPS. Last for obvious reasons.
Just ideas for now, I'm afraid, no code yet.
Pip
- Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland, (continued)
- Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland, Pip Cet, 2024/09/06
- Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland, Eval EXEC, 2024/09/06
- Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland, Pip Cet, 2024/09/06
- Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland, Eval EXEC, 2024/09/06
- Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland, Pip Cet, 2024/09/06
- Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland, Eval EXEC, 2024/09/06
- Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland, Eli Zaretskii, 2024/09/06
- Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland, Eval EXEC, 2024/09/06
- Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland, Pip Cet, 2024/09/07
- Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland, Eli Zaretskii, 2024/09/07
- Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland,
Pip Cet <=
- Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland, Eval EXEC, 2024/09/06
- Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland, Pip Cet, 2024/09/06
- Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland, Eval EXEC, 2024/09/07
- Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland, Pip Cet, 2024/09/07
- Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland, Gerd Möllmann, 2024/09/07
- Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland, Eval EXEC, 2024/09/05
- Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland, Eval EXEC, 2024/09/05
- Re: [scratch/igc] 985247b6bee crash on Linux, KDE, Wayland, Eval EXEC, 2024/09/05