[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Skipping unexec via a big .elc file

From: Ken Raeburn
Subject: Re: Skipping unexec via a big .elc file
Date: Thu, 27 Oct 2016 04:51:24 -0400

> On Oct 25, 2016, at 09:48, Stefan Monnier <address@hidden> wrote:
>>>> Did you check whether actually byte compiling the written file made
>>>> a difference?
>>> dumped.elc has no code to compile.
>> It has a lot of fset and setplist calls which can be compiled, especially if
>> you reorder things such that they’re not mixed up with the defvar calls that
>> don’t compile.
> "A lot of" is relative: the time to read them compared to an equivalent
> byte-code version should be negligeable, and their execution time should
> be even more negligeable.
>> The generated .elc output is about 25% larger.
> That's not because of byte-compilation per-se.  It's because the
> byte-compiler uses `print-circle' but only within each top-level entity,
> so you lose sharing between functions and between variables.
> IOW you can get the exact same 25% larger file by printing each
> fset/defvar/setplist separately (instead of printing them as one big
> `progn`).  And you can trick the byte-compiler to preserve this sharing
> by replacing the leading `progn` (which the byte-compiler removes) into
> a (let () ...), tho maybe you'll need to really add some dummy binding
> in that `let` to make sure the byte-compiler doesn't end up removing it.

Ah, yes… “(let () …)” was enough with no bindings.  Now the compiled file, 
which now contains only one big byte-code invocation, is still larger than the 
original dumped file, though not as much, and from a couple of spot checks it 
looks like the data sharing is indeed preserved.  It also takes longer to load. 
 Oh well.

> Ideally, we could get rid of substitute_object_in_subtree entirely.
> E.g. the patch below skips it for the case of "#n=(...)", and by peeping
> ahead to decide the type of placeholder we build, we should be able to
> get rid of it in all cases.

I would think not for types using flexible array members, since we may not know 
the allocation size until we’ve seen the end of the object.

In poking around with gdb, most of the invocations of 
substitute_object_in_subtree I looked at got a subtree of nil.  It appears to 
me that if the “subtree” passed isn’t the placeholder and isn’t one of the 
types we process recursively, then we will never do any substitution, right?  
So the checking of seen_list and read_objects isn’t relevant.

I started my tests over with an updated source tree from upstream and put in 
your loadup.el change.  Running “time emacs -batch -l dumped.elc” took 3.5s; 
according to “perf record”/“perf report”, Frassq took about 85% of the CPU 
time, and Fassq took about 9%.

Added your lread.c patch; run time is about 1.8s, 70% in Frassq and almost 20% 
in Fassq.

Patched substitute_object_recurse after the check for the subtree matching the 
placeholder, so that if the subtree passed was a symbol or number, it would 
simply be returned without consulting seen_list or read_objects.  Run time is 
now 0.7s; Fassq is a bit over 50% of that, and Frassq about 17%, and _IO_getc 
around 11%.  I think it should be safe to short-circuit it for some other types 
as well.

I had my getc_unlocked change sitting around so I pulled that in.  Run time is 
now 0.6s, with Fassq at 57% and Frassq at 18%.

Next on the profiling chart is oblookup, but it’s only at 4% so I’m going to 
ignore OBARRAY_SIZE for now.  However, OBARRAY_SIZE could affect the order of 
atoms in processing, which could drastically rearrange the ordering of the data 
structures in dumped.elc.

I think the next step is to look at replacing read_objects, probably with a 
pair of hash tables, but it’s getting a bit late for trying that tonight.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]