[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Loading tramp for dump goes into infinite regress

From: Lynn Winebarger
Subject: Re: Loading tramp for dump goes into infinite regress
Date: Tue, 9 Aug 2022 08:29:13 -0400

I wanted to circle back and answer this now that I have a working "mega-dump".

On Mon, Jul 25, 2022 at 9:56 AM Eli Zaretskii <eliz@gnu.org> wrote:

> > Another benefit I expect from native-compilation, dumped or not, is more 
> > efficient memory use when running
> > multiple emacs processes.  With dumping, I would expect (or hope for) 
> > better garbage collector behavior
> > since the amount of allocation required for the loaded modules should be 
> > pre-determined (whether byte- or
> > native-compiled).  If the image is 300MB (including the shared libraries), 
> > so be it, as long as the memory is
> > shared between multiple processes.
> I don't think I understand this expectation, and I don't think
> natively-compiled code has any advantages wrt GC over the
> byte-compiled code.

There should be at least one advantage - code representing function
calls to other byte code (even in the same compilation unit) are
represented by references in the heap which must be traced by the GC.
Take this with a block of salt, as I have not verified by inspection
of the natively compiled code, but many of those heap references
should be translated to control transfers to other code addresses,
which are not in the heap allocated or traced by the GC.
But that isn't what I was referring to above.
The first statement was intended with regard to the efficiency of
using shared libraries with code in read-only pages between multiple
processes versus byte code being loaded into modifiable memory regions
along with mutable data, increasing the probability that the memory
will either be not shared from the start or become unshareable due to
a CoW fault.
The second statement was comparing dumped versus undumped performance,
whether native- or byte- compiled.  This isn't a novel observation,
but the dump is basically a one-shot mark-compact GC collection, and
the mmap'ed pdump files would ideally be treated like the old
generation of a generational collector, and so not traced in most (or
all) collection cycles.  This is (or should be) one benefit of pure
space, although the requirement that it be read-only could be dropped
as long as the write-barrier remains and is used to record any
inter-generational pointers to a root set traced by the collector.

> > I'd also like a baseline expectation of performance with native-compiled 
> > libraries in the dumped image.
> What kind of performance?

Primarily usability.  And I can report that going from undumped to
dumped native-compiled does massively improve performance.  I have not
measured byte-compiled versus native-compiled when both are dumped.
Even with 2100+ eln files being loaded, the startup is very fast even
loading my .init settings (that turn on all the Semantic minor modes,
for example).
When I turned on the profiler for a while, it reported about 45% of
the cpu time was in gc, which I found surprising.  It was a little
laggy in some places, but not "stop-the-world collection of a 200MB
heap" laggy.
I'm tempted to try a quickish hack to see if I can turn pure space
into a never-collected elder generation by making the size a static
variable instead of a define and pointing pure to the start of the
dump image, then modifying the write barrier as described above, just
for my local 28.1 variant.  I don't mind wasting the memory if it
keeps the heap traced by the collector to a reasonable size focused on
shorter-lived constructs than the ones that survive the dump
collection.  I'd be surprised if it worked that easily, though.

> I'm saying that your job could be easier if you did the first step
> with byte-compiled files.

And it was - thanks.

Let me know if there are any specific metrics that would be useful.  I
can't promise I'll be able to get back with details quickly (I may
have to create novel implementations of the algorithms on my personal
machines), but I will see what I can do.  The emacs dev team and
contributors have really made tremendous improvements since I last
considered using the emacs source for playing around with some
language implementation ideas somewhere in the 2004-2007 period.  I
thought the massive code base of dynamic scoping and the
implementation of closure with "funvecs" (as they were called then)
was just too big to contemplate, but you guys have done it.  "Closing
the loop" by enabling the dumping of additional natively compiled
libraries is a minor tweak in the code, but it makes a huge difference
if you're trying to make use of all the additional functionality
provided by the available add-on packages.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]