emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Dumper problems and a possible solutions


From: Rich Felker
Subject: Dumper problems and a possible solutions
Date: Tue, 24 Jun 2014 13:19:55 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

Hi,

I've been trying to get current emacs working on musl libc based
systems, and running into trouble with the dumper. After a lot of
hacking I got it to work, and I'm hoping something based on the ideas
(not the implementation, which is hideously ugly) in my work could be
acceptable upstream.

By far the biggest problem is malloc-related. musl does not support
overriding malloc, so I'm building with system malloc. The emacs
dumper assumes all of the allocations that need to survive dumping end
up in the brk segment, which is not a constraint musl's allocator can
satisfy -- it has no support for huge allocations in brk, and will
sometimes opt to use mmap rather than brk for extending the heap for
small objects.

The hack I used to solve this is really simple: I added to alloc.c a
tiny allocator that just uses a giant static buffer which gets used
for lisp object allocations prior to dumping (it has an extern flag
that the dumper sets to indicate whether this code is running prior to
dumping or afterwards). With free being a NOP, as it is in my
implementation right now, I had to make this buffer 400 megs; that's
the main reason the patch would be utterly unacceptable as-is. I
believe the problem could be solved by writing a trivial "early
malloc" implementation that uses a static buffer, but with proper
recycling of free areas.

There are also, however, at least two other issues which affect static
linking. Dynamic linking does not seem to be affected because the
dumper doesn't save libc's globals in the dynamic-linked case, and
these involve global state in libc:

One is that, with modern Linux kernels with brk randomization as part
of ASLR, dumping saves malloc's idea of the current brk, and when it
mismatches at runtime, malloc will either crash or "allocate" memory
that's not even mapped. Trying to work around this in musl is not
acceptable because it would penalize all programs with extra syscalls
whenever malloc has to expand the brk.

The other issue is that musl's clock_gettime and related functions
store the pointer to the vdso version of this function. Since the
kernel maps vdso at a random address, the stored value from before
dumping will not be valid when the dumped executable is run.

To solve ALL of the problems with the dumper (which seems to be a
recurring theme), I have a proposed design to make it fully portable
-- even moreso than xemacs "portable dumper" which is still an ugly
hack. The idea is simple: after loading all of the lisp objects that
need dumping, walk the lisp heap and output a representation for each
object as a giant static array in C source format, then compile and
link this new translation unit with the rest of the emacs .o files to
produce a final emacs binary. No hacks with binary formats would be
involved; everything would happen at the C source level. As part of
the lisp heap dumping, address references to other objects would have
to be relocated to refer to the object's position in the static array
rather than the original address at which the object resided when
created in temacs. That's some non-trivial work, but definitely no
prohibitive, and as a bonus, the generated address-constant references
in the static array would transform to load-address-relative
relocations for the linker, allowing emacs to be built as a
position-indepdendent executable (PIE) if desired.

Does this sound like a viable direction? I'm not an emacs hacker by
any means and don't think I'm qualified to do the lisp heap dumping
implementation, but I could certainly help with design or any issues
that arise during implementation if others are interested in working
on it.

If not, or if that's going to be a very long-term project, would a
cleaned-up version of my current solution be acceptable upstream?

Rich



reply via email to

[Prev in Thread] Current Thread [Next in Thread]