Re: Bytecode interoperability: the good and bad

On Fri, 22 Dec 2017 09:08:33 -050 Stefan Monnier advised:

> In other kinds of bytecode such as the one for C Python, a bytecode version
> number is stored in the bytecode file. When there is a change to the
> bytecode, that number is changed.

So far, the only changes that have been made to the byte-code language
is to add new (previously unused) byte codes. So from this perspective
we have always maintained backward compatibility (you can run a .elc
compiled with an older Emacs).

While this is a nice intention, it isn't always true. And it is not with downsides.

In the "not true" department, there are instructions 0153 scan_buffer and 0163 set_mark which aren't handled in the current interpreter sources in bytecode.c

And as pipcet points out, there is this in lread.c:

   
  if (! version || version >= 22)
    readevalloop (Qget_file_char, &input, hist_file_name,
		  0, Qnil, Qnil, Qnil, Qnil);
  else
    {
      /* We can't handle a file which was compiled with
	 byte-compile-dynamic by older version of Emacs.  */
      specbind (Qload_force_doc_strings, Qt);
      readevalloop (Qget_emacs_mule_file_char, &input, hist_file_name,
		    0, Qnil, Qnil, Qnil, Qnil);
    }

In the "not without downsides" department, this means that when someone looks at the bytecode interpreter, it is filled with garbage and bloat. This has to have a technology debt associated with it.

We do not aim to maintain forward compatibility (so whether a .elc file
compiled with a more recent Emacs will work is not guaranteed), although
it sometimes does work. When encountering an unknown byte-code, Emacs
signals an error, so it shouldn't cause a crash nor "something unintended".

It is likely that the code that purports to handle obsolete (or no longer emitted) instructions is broken, since I doubt any of this behavior is tested. Subtle changes in the semantics of instructions can cause unintended effects.

Compatibility problems with .elc files compiled with other Emacs
versions can also come from macros, and those tend to be more frequent
than the problems introduced by changes to the byte-code. So detecting
a different byte-code version is not sufficient to catch the most common
problems anyway.

My understanding of how this work in a more rational way would be that there shouldn't be incompatible changes between major releases. So I would hope that incompatible macro changes wouldn't happen within a major release but between major releases, the same as I hope would be the case for bytecode changes.

If someone is up for it, a possibly interesting program to write might be a bytecode lint and report tool that shows the meta comment in bytecode to describe what version of Emacs the bytecode was compiled under (comparing with the current loaded version), what level of optimization is reported. Possibly a scan over the instructions to look for incompatibility both in the forward and backward direction. It might optionally have knowledge of specific version incompatibilities say because of macro changes between versions.

Maybe this could be incorporated into a "safe-load-file" function.

FWIW, I think Emacs deserves a new Elisp compilation system (either
a new kind of bytecode (maybe using something like vmgen), or a JIT or
something): the bytecode we use is basically identical to the one we had
20 years ago, yet the tradeoffs have changed substantially in the
mean time.

I would be interested in elaboration here about what specific trade offs you mean.

From what I've seen of Emacs Lisp bytecode, I think it would be a bit difficult to use something like vmgen without a lot of effort. In the interpreter for vmgen the objects are basically C kinds of objects, not Lisp Objects. Perhaps that could be negotiated, but it would not be trivial.

As for JITing bytecode, haven't there been a couple of efforts in that direction already? Again, this is probably hard.

I'm not saying it shouldn't be done. Just that these are very serious projects requiring a lot of effort that would take a bit of time, and might cause instability in the interim. All while Emacs is moving forward on its own.

But in any event, a prerequisite for considering doing this is to understand what we got right now. That's why I'm trying to document that more people at least have an understanding of what we are talking about in the replacing or modifying the existing system.

Right now I feel that there are only a handful of people who understand bytecode, and even there maybe not in entirety.

From:	Rocky Bernstein
Subject:	Re: Bytecode interoperability: the good and bad
Date:	Fri, 22 Dec 2017 12:41:30 -0500