Re: native compilation units

On Sun, Jun 5, 2022 at 8:16 AM Lynn Winebarger <owinebar@gmail.com> wrote:

On Sat, Jun 4, 2022, 10:32 AM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>> [ But that doesn't mean we shouldn't try to compile several ELisp files
>> into a single ELN file, especially since the size of ELN files seems
>> to be proportionally larger for small ELisp files than for large
>> ones. ]
>
> Since I learned of the native compiler in 28.1, I decided to try it out and
> also "throw the spaghetti at the wall" with a bunch of packages that
> provide features similar to those found in more "modern" IDEs. In terms of
> startup time, the normal package system does not deal well with hundreds of
> directories on the load path, regardless of AOR native compilation, so I'm
> tranforming the packages to install in the version-specific load path, and
> compiling that ahead of time. At least for the ones amenable to such
> treatment.

There are two load-paths at play (`load-path` and
`native-comp-eln-load-path`) and I'm not sure which one you're taking
about. OT1H `native-comp-eln-load-path` should not grow with the number
of packages so it typically contains exactly 2 entries, and definitely
not hundreds. OTOH `load-path` is unrelated to native compilation.

Not entirely - as I understand it, the load system first finds the source file and computers a hash before determining if there is an ELN file corresponding to it.
Although I do wonder if there is some optimization for ELN files in the system directory as opposed to the user's cache. I have one build where I native compiled (but not byte compiled) all the el files in the lisp directory, and another where I byte compiled and then native compiled the same set of files. In both cases I used the flag to batch-native-compile to put the ELN file in the system cache. In the first case a number of files failed to compile, and in the second, they all compiled. I've also observed another situation where a file will only (bye or native) compile if one of its required files has been byte compiled ahead of time - but only native compiling that dependency resulted in the same behavior as not compiling it at all. I planned to send a separate mail to the list asking whether it was intended behavior once I had reduced it to a simple case, or if it should be submitted as a bug.

Unrelated, but the one type of file I don't seem to be able to produce AOT (because I have no way to specify them) in the system directory are the subr/trampoline files. Any hints on how to make those AOT in the system directory?

Also, what kind of startup time are you talking about?
E.g., are you using `package-quickstart`?
That was the first alternative I tried. With 1250 packages, it did not work. First, the file consisted of a series of "let" forms corresponding to the package directories, and apparently the autoload forms are ignored if they appear anywhere below top-level. At least I got a number of warnings to that effect.
The other problem was that I got a "bytecode overflow error". I only got the first error after chopping off the file approximately after the first 10k lines. Oddly enough, when I put all the files in the site-lisp directory, and collect all the autoloads for that directory in a single file, it has no problem with the 80k line file that results.

Also, I should have responded to the first question - "minutes" on recent server-grade hardware with 24 cores and >100GB of RAM. That was with 1193 enabled packages in my .emacs file.

On Sun, Jun 5, 2022 at 8:16 AM Lynn Winebarger <owinebar@gmail.com> wrote:

On Sat, Jun 4, 2022, 10:32 AM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>> Performance issues with read access to directories containing less than
>> 10K files seems like something that was solved last century, so
>> I wouldn't worry very much about it.
> Per my response to Eli, I see (network) directories become almost unusable
> somewhere around 1000 files,

I don't doubt there are still (in the current century) cases where
largish directories get slow, but what I meant is that it's now
considered as a problem that should be solved by making those
directories fast rather than by avoiding making them so large.
Unfortunately sometimes we have to cope with environment we use. And for all I know some of the performance penalties may be inherent in the (security related) infrastructure requirements in a highly regulated industry.
Not that that should be a primary concern for the development team, but it is something a local packager might be stuck with.

>> [ But that doesn't mean we shouldn't try to compile several ELisp files
>> into a single ELN file, especially since the size of ELN files seems
>> to be proportionally larger for small ELisp files than for large
>> ones. ]
>
> Since I learned of the native compiler in 28.1, I decided to try it out and
> also "throw the spaghetti at the wall" with a bunch of packages that
> provide features similar to those found in more "modern" IDEs. In terms of
> startup time, the normal package system does not deal well with hundreds of
> directories on the load path, regardless of AOR native compilation, so I'm
> tranforming the packages to install in the version-specific load path, and
> compiling that ahead of time. At least for the ones amenable to such
> treatment.

There are two load-paths at play (`load-path` and
`native-comp-eln-load-path`) and I'm not sure which one you're taking
about. OT1H `native-comp-eln-load-path` should not grow with the number
of packages so it typically contains exactly 2 entries, and definitely
not hundreds. OTOH `load-path` is unrelated to native compilation.

Not entirely - as I understand it, the load system first finds the source file and computers a hash before determining if there is an ELN file corresponding to it.
Although I do wonder if there is some optimization for ELN files in the system directory as opposed to the user's cache. I have one build where I native compiled (but not byte compiled) all the el files in the lisp directory, and another where I byte compiled and then native compiled the same set of files. In both cases I used the flag to batch-native-compile to put the ELN file in the system cache. In the first case a number of files failed to compile, and in the second, they all compiled. I've also observed another situation where a file will only (bye or native) compile if one of its required files has been byte compiled ahead of time - but only native compiling that dependency resulted in the same behavior as not compiling it at all. I planned to send a separate mail to the list asking whether it was intended behavior once I had reduced it to a simple case, or if it should be submitted as a bug.
In any case, I noticed that the "browse customization groups" buffer is noticeable faster in the second case. I need to try it again to confirm that it wasn't just waiting on the relevant source files to compile in the first case.

I also don't understand what you mean by "version-specific load path".
In the usual unix installation, there will be a "site-lisp" one directory above the version specific installation directory, and another site-lisp in the version-specific installation directory. I'm referring to installing the source (ultimately) in ..../emacs/28.1/site-lisp. During the build it's just in the site-lisp subdirectory of the source root path.

Also, what kind of startup time are you talking about?
E.g., are you using `package-quickstart`?
That was the first alternative I tried. With 1250 packages, it did not work. First, the file consisted of a series of "let" forms corresponding to the package directories, and apparently the autoload forms are ignored if they appear anywhere below top-level. At least I got a number of warnings to that effect.
The other problem was that I got a "bytecode overflow error". I only got the first error after chopping off the file approximately after the first 10k lines. Oddly enough, when I put all the files in the site-lisp directory, and collect all the autoloads for that directory in a single file, it has no problem with the 80k line file that results.

> Given I'm compiling all the files AOT for use in a common installation
> (this is on Linux, not Windows), the natural question for me is whether
> larger compilation units would be more efficient, particularly at startup.

It all depends where the slowdown comes from :-)

E.g. `package-quickstart` follows a similar idea to the one you propose
by collecting all the `<pkg>-autoloads.el` into one bug file, which
saves us from having to load separately all those little files. It also
saves us from having to look for them through those hundreds
of directories.

I suspect a long `load-path` can itself be a source of slow down
especially during startup, but I haven't bumped into that yet.
There are ways we could speed it up, if needed:

- create "meta packages" (or just one containing all your packages),
which would bring together in a single directory the files of several
packages (and presumably also bring together their
`<pkg>-autoloads.el` into a larger combined one). Under GNU/Linux we
could have this metapackage be made of symlinks, making it fairly
efficient an non-obtrusive (e.g. `C-h o` could still get you to the
actual file rather than its metapackage-copy).
- Manage a cache of where are our ELisp files (i.e. a hash table
mapping relative ELisp file names to the absolute file name returned
by looking for them in `load-path`). This way we can usually avoid
scanning those hundred directories to find the .elc file we need, and
go straight to it.
I'm pretty sure the load-path is an issue with 1250 packages, even if half of them consist of single files.

Since I'm preparing this for a custom installation that will be accessible for multiple users, I decided to try putting everything in site-lisp and native compile everything AOT. Most of the other potential users are not experienced Unix users, which is why I'm trying to make everything work smoothly up front and have features they would find familiar from other editors.

One issue with this approach is that the package selection mechanism doesn't recognize the modules as being installed, or provide any assistance in selectively activating modules.

Other places where there is a noticeable slowdown with large numbers of packages:
* Browsing customization groups - just unfolding a single group can take minutes (this is on fast server hardware with a lot of free memory)
* Browsing custom themes with many theme packages installed
I haven't gotten to the point that I can test the same situation by explicitly loading the same modules from the site-lisp directory that had been activated as packages. Installing the themes in the system directory does skip the "suspicious files" check that occurs when loading them from the user configuration.

> I posed the question to the list mostly to see if the approach (or similar)
> had already been tested for viability or effectiveness, so I can avoid
> unnecessary experimentation if the answer is already well-understood.

I don't think it has been tried, no.

> I don't know enough about modern library loading to know whether you'd
> expect N distinct but interdependent dynamic libraries to be loaded in as
> compact a memory region as a single dynamic library formed from the same
> underlying object code.

I think you're right here, but I'd expect the effect to be fairly small
except when the .elc/.eln files are themselves small.

There are a lot of packages that have fairly small source files, just because they've factored their code the same way it would be in languages where the shared libraries are not in 1-1 correspondence with source files.

> It's not clear to me whether those points are limited to call
> sites or not.

I believe it is: the optimization is to replace a call via `Ffuncall` to
a "symbol" (which looks up the value stored in the `symbol-function`
cell), with a direct call to the actual C function contained in the
"subr" object itself (expected to be) contained in the
`symbol-function` cell.

Andrea would know if there are other semantic-non-preserving
optimizations in the level 3 of the optimizations, but IIUC this is very
much the main one.

>> IIUC the current native-compiler will actually leave those
>> locally-defined functions in their byte-code form :-(
> That's not what I understood from
> https://akrl.sdf.org/gccemacs.html#org0f21a5b
> As you deduce below, I come from a Scheme background - cl-flet is the form
> I should have referenced, not let.

Indeed you're right that those functions can be native compiled, tho only if
they're closed (i.e. if they don't refer to surrounding lexical
variables).
[ I always forget that little detail :-( ]

I would expect this would apply to most top-level defuns in elisp packages/modules. From my cursory review, it looks like the ability to redefine these defuns is mostly useful when developing the packages themselves, and "sealing" them for use would be appropriate.
I'm not clear on whether this optimization is limited to the case of calling functions defined in the compilation unit, or applied more broadly.

Thanks,
Lynn

From:	Lynn Winebarger
Subject:	Re: native compilation units
Date:	Sun, 5 Jun 2022 10:08:35 -0400