[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Crash robustness (Was: Re: Dynamic modules: MODULE_HANDLE_SIGNALS et

From: Daniel Colascione
Subject: Re: Crash robustness (Was: Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.)
Date: Wed, 23 Dec 2015 09:41:20 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0

On 12/23/2015 09:30 AM, Eli Zaretskii wrote:
>> Cc: address@hidden, address@hidden,
>>  address@hidden, address@hidden, address@hidden
>> From: Daniel Colascione <address@hidden>
>> Date: Wed, 23 Dec 2015 08:25:51 -0800
>> We can alloca, say, 8MB, and write to the start and end of the allocated
>> region.
> How do you know the alloca won't trigger stack overflow?

We don't know that, but at program startup, we have no data to lose. How
do you know Emacs BSS requirements won't run the system out of memory?

>> Then we'll know we have at least that much stack space available.
> At that point, yes.  But you need to know that at many other points,
> when some of the stack is already used up.

Sure. But now Emacs can ask itself, "do I have at least X KB of stack
space available?", and if the answer is "no", signal if Y KB of stack is
available (Y<X, of course), or abort if not.

>>> I simply don't see any trouble this could cause, except leaking some
>>> memory.  Can you describe in enough detail a single use case where
>>> this could have any other adverse effects that we should care about
>>> when recovering from stack overflow?
>> What happens if we overflow inside malloc? One possibility is that we'll
>> longjmp back to toplevel without releasing the heap lock, then deadlock
>> the next time we try to allocate.
> I very much doubt anything like that can happen.  An malloc
> implementation which behaves like that won't last long.  Lots of C
> programs longjmp from signal handlers, so interrupting malloc with,
> say, SIGINT, must work.  I think even Emacs did something like that in
> the past, at least on a TTY, where C-g triggers SIGINT.

These programs are all unsafe. If they work, it's by luck alone. In
fact, it's not possible to write a malloc that behaves the way you'd
like, since malloc can legitimately take locks, and the system provides
no way to release them on non-local exit from a signal handler.

You're essentially claiming that programs using pthread_mutex_lock won't
last long. There are a few existence proofs here and there to the contrary.

The problem isn't limited to locks. Malloc could be in the middle of
updating internal data structures when you longjmp out of it. The next
allocation could scribble over arbitrary memory.

>>>> We have a program that has its own Lisp runtime, has its own memory
>>>> allocation system, uses its own virtual filesystem access layer, and
>>>> that brings itself back from the dead. We're well past replicating OS
>>>> functionality.
>>> Actually, most of the above is simply untrue: we use system allocators
>>> to allocate memory
>> We have internal allocators for strings and conses and use the system
>> allocator only for backing storage.
> On some systems.  Not on all of them.
>> , and if by "bringing itself from the dead" you allude to
>>> unexec, then what it does is a subset of what every linker does,
>>> hardly an OS stuff.
>> Granted, that's toolchain work, not "OS" work, but it's still outside
>> the domain of most text editors.
> Sure.  But a linker is still an application that reads and writes
> files.  It doesn't futz with OS-level features like page protection
> and processor exceptions.

What's so scary about page protection? I've yet to see a coherent
argument for why we shouldn't take advantage of the facility where it's

>>> Emacs is not safety-critical software.  It doesn't need to be "safe"
>>> by your definition, if I understand it correctly.
>> It's not safety-critical software, but undefined behavior is undefined.
>> What makes us confident that we can't corrupt buffer data by longjmping
>> from the wrong place?
> Nothing makes us confident.  Recovery from stack overflow is not
> guaranteed to work in all cases.  But if it works in some of them, it
> is already better than always crashing, IMO.

Why? If we can prevent data loss, I'd rather reliably crash than enter
some frankenstate where anything can happen.

>> Anything can happen because we can longjmp from anywhere.
> Yes.  But if we hit a stack overflow, we are already in deep trouble.

And it's because we're in deep trouble that we should kill the program
as quickly as possible.

>> What if we just installed a SIGSEGV handler (or, on Windows, a vectored
>> exception handler) that wrote buffer contents to a special file on a
>> fatal signal, then allowed that fatal signal to propagate normally?
> I presume you mean auto-save, not save.
> We could try calling shut_down_emacs from the signal handler, but I'm
> not sure if the small alternate stack will be enough for write-region.
> Something to investigate, I guess.

We can make the alternate signal stack as large as we want.

>> The next time Emacs starts, we can restore the buffers we've saved
>> this way and ask users to save them --- just like autosave, but done
>> on-demand, at crash time, in C code, on the alternate signal stack.
> Why "like autosave"?  What will be different from actually
> auto-saving?  shut_down_emacs does that automatically.

Er, yes, I noticed that after I wrote the email that we already do what
I propose, more or less. In this case, we don't lose very much by just
deleting the stack overflow code and relying on autosave.

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]