Re: native compilation units

On Sun, Jun 12, 2022 at 2:47 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:

>> >> In which sense would it be different from:
>> >>
>> >> (cl-flet
>> >> ...
>> >> (defun ...)
>> >> (defun ...)
>> >> ...)
>> >>
> I'm trying to determine if there's a set of expressions for which it
> is semantically sound to perform the intraprocedural optimizations

The cl-flet above is such an example, AFAIK. Or maybe I don't
understand what you mean.

To be clear, I'm trying to first understand what Andrea means by "safe". I'm assuming it

means the result agrees with whatever the byte compiler and VM would produce for the

same code. I doubt I'm bringing up topics or ideas that are new to you. But if I do make

use of semantic/wisent, I'd like to know the result can be fast (modulo garbage collection, anyway).

I've been operating under the assumption that

Compiled code objects should be first class in the sense that they can be serialized
just by using print and read. That seems to have been important historically, and
was true for byte-code vectors for dynamically scoped functions. It's still true for
byte-code vectors of top-level functions, but is not true for byte-code vectors for
closures (and hasn't been for at least a decade, apparently).
It's still worthwhile to have a class of code objects that are immutable in the VM
semantics, but now because there are compiler passes implemented that can
make use of that as an invariant
cl-flet doesn't allow mutual recursion, and there is no shared state above,
so there's nothing to optimize intraprocedurally.
cl-labels is implemented with closures, so (as I understand it) the native
compiler would not be able to produce code if you asked it to compile
the closure returned by a form like (cl-labels ((f ..) (g...) ...) f)

I also mistakenly thought byte-code-vectors of the sort saved in ".elc" files would not

be able to represent closures without being consed, as the components (at least

the first 4) are nominally constant. But I see that closures are being implemented

by calling an ordinary function that side-effects the "constants" vector. That's unfortunate

because it means the optimizer cannot assume byte-vectors are constants that can be

freely propagated. OTOH, prior to commit

https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=d0c47652e527397cae96444c881bf60455c763c1

it looks like the closures were constructed at compile time rather than by side-effect,

which would mean the VM would be expected to treat them as immutable, at least.

Wedging closures into the byte-code format that works for dynamic scoping

could be made to work with shared structures, but you'd need to modify

print to always capture shared structure (at least for byte-code vectors),

not just when there's a cycle. The approach that's been implemented only

works at run-time when there's shared state between closures, at least as far

asI can tell.

However, it's a hack that will never really correspond closely to the semantics
of shared objects with explicit tracking and load-time linking of compile-time

symbols, because the relocations are already performed and there's no way to

back out where they occured from the value itself. If a goal is to have a

semantics in which you can

unambiguously specify that at load/run time a function or variable name
is resolved in the compile time environment provided by a separate
compilation unit as an immutable constant at run-time
serialize compiled closures as compilation units that provide a well-defined
compile-time environment for linking
reduce the headaches of the compiler writer by making it easy to
produce code that is eligible for their optimizations

Then I think the current approach is suboptimal. The current byte-code representation

is analogous to the a.out format. Because the .elc files run code on load you can

put an arbitrary amount of infrastructure in there to support an implementation

of compilation units with exported compile-time symbols, but it puts a lot more

burden on the compiler and linker/loader writers than just being explicit would.

And I'm not sure what the payoff is. When there wasn't a native compiler (and

associated optimization passes), I suppose there was no pressing reason

to upend backward compatibility. Then again, I've never been responsible

for maintaining a 3-4 decade old application with I don't have any idea how

large an installed user base ranging in size from chips running "smart" electric

switches to (I assume) the biggest of "big iron", whatever that means these days.

> I'm trying to capture a function as a first class value.

Functions are first class values and they can be trivially captured via
things like (setq foo (lambda ...)), (defalias 'foo (lambda ...)) and
a lot more, so I there's some additional constraint you're expecting but
I don't know what that is.

Yes, I thought byte-code would be treated as constant. I still think it makes a lot of sense

to make it so.

> This was not expected with lexical scope.

You explicitly write `(require 'cl-lib)` but I don't see any

-*- lexical-binding:t -*-

anywhere, so I suspect you forgot to add those cookies that are needed
to get proper lexical scoping.

Ok, wow, I really misread the NEWS for 28.1 where it said

The 'lexical-binding' local variable is always enabled.

As meaning "always set". My fault.

> With the current byte-codes, there's just no way to express a call to
> an offset in the current byte-vector.

Indeed, but you can call a byte-code object instead.

Creating the byte code with shared structure was what I meant by one of the solutions being to

"patch compile-time constants" at load, i.e. perform the relocations directly. The current

implementation effectively inlines copies of the constants (byte-code objects), which is fine for shared code but not

for shared variables. That is, the values that are assigned to my-global-oddp and my-global-evenp (for test2 after

correcting the lexical-binding setting) do not reference each other. Each is created with an independent copy of

the other.

From:	Lynn Winebarger
Subject:	Re: native compilation units
Date:	Mon, 13 Jun 2022 12:33:19 -0400