[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Skipping unexec via a big .elc file

From: Daniel Colascione
Subject: Re: Skipping unexec via a big .elc file
Date: Tue, 25 Oct 2016 09:14:55 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0

On 10/25/2016 08:59 AM, Eli Zaretskii wrote:
From: Daniel Colascione <address@hidden>
Cc: Eli Zaretskii <address@hidden>,  address@hidden,  address@hidden
Date: Mon, 24 Oct 2016 12:47:56 -0700

I'd argue that we are already in this situation.  For example, nobody
knows how to make unexec work with ASLR or PIE; when I tried fuzzing
Emacs with AFL, the dumped binary would simply crash; the dumped
binary is not reproducible (i.e. bit-by-bit identical after every
build); and I think dumping also doesn't work with ASan. The fraction
of situation where unexec doesn't work any more gets larger and
larger. If we had people who could solve these problems, it should get
smaller instead.

Everyone who's seriously thought about the unexec problem _understands_
the issue.

The important point is that the number of people here who can claim
such understanding, enough so to fix the issues, is diminishingly
small, and gets smaller every year.

There's no demand for more yet. There isn't a catastrophe --- just low demand for core-change expertise. There used* to be a lot more (at least per-capita) stonemasons in historical societies than in today's society. That doesn't mean we've forgotten how to cut stones, and if there were a sudden need to do it, more stonemasons would magically appear.

My preferred approach is the portable dumper one: basically what we're
doing today, except that instead of just blindly copying the data
segment and heap to a new emacs binary, we'll write this information to
a separate file, stored in a portable format, a file that we'll keep
alongside the Emacs binary.  We'll store in this file metadata about
where the pointers are. (There are two kinds of pointers in this file:
pointers to other parts of the file and pointers to the Emacs binary.)

At startup, we'll load the dump file and walk the relocations, fixing up
all the embedded addresses to account for the new process's different
address space.

Why do you think this will have better performance that reading a
single .elc file at startup?  It's still mainly file I/O and
processing of the file's contents, just like with byte-compiled files.

Because a portable dumper can do less, on both file I/O and processing of the file's contents. There's no lisp evaluation, no slurping a whole file into memory. Having to read all of Emacs into memory on startup is a burden even on a fast, modern machine like mine.

$ sync && sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'

$ time pv < emacs >/dev/null
48.6MiB 0:00:00 [ 455MiB/s] [=========================================================>] 100%

real    0m0.116s
user    0m0.000s
sys     0m0.016s

That's pretty fast, but it's not free. Not having to do this much IO on startup in the first place would be even better.

If we have no reason to believe this portable dumper will be
significantly faster, we should IMO investigate the .elc method first,
because it's so much simpler, both in its implementation and in future
maintenance.  E.g., adding a new kind of Lisp object to Emacs would
require corresponding changes in the dumper.

Adding a new kind of lisp object requires changes throughout core anyway. At the very least, you need to teach GC where your new object keeps its pointers, and that's exactly the knowledge that the dumper would need.

We can't save all of the Emacs data segment this way, but we can
relocate and restore anything that's marked with staticpro. The overall
experience should be very similar to what we have today.
Speaking of COW faults: a refinement of this scheme is to do the
relocations lazily, in a SIGSEGV handler.  (Map the dump file PROT_NONE
so any access traps.)  In the SIGSEGV handler, we can relocate just the
page we faulted, then continue. This way, we don't need to slurp in the
entire dump file from disk just to start emacs -Q -batch: we can

Demand paging in an application, and an application such as Emacs on
top of that, makes little sense to me.

Why? It's conceptually no different from autoload. There is no technique in computer science so rarefied that it's only good in ring zero.

This is the OS business, not
ours.  Using mmap as a fast way to read a file, yes, that's done in
many applications.  But please lets leave demand paging out of our

Emacs isn't just an application. It's a Lisp virtual machine, and employing the optimization techniques used in other virtual machines can be important wins.

(FWIW, mmap isn't a particularly fast way of doing bulk file reads. That's why GNU grep removed its mmap support.)

IMO the less we mess with low-level techniques that no other
applications use the better, both because we have very few people who
can do that and because doing so runs higher risk of becoming broken
by future developments in the platforms we deem important.  The
long-term tendency in Emacs development should be to move away from
such techniques, not to acquire more of them.

I'm for anything that delivers meaningful performance advantages.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]