emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Finding the dump (redux)


From: Ali Bahrami
Subject: Re: Finding the dump (redux)
Date: Sun, 18 Apr 2021 22:01:51 -0600
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.9.1

Hi Eli,

   Your message about not doing anything now, and
possibly, doing something later, is heard loud and clear.
As I said up front, I have a couple of ways to fix this
that don't require anyone else, so I'm fine with that.
I brought this up because I think the way this works
is slightly broken, and could be fixed. I thought it was
worth a shot to see if we can't do something at the source,
rather than having various actors like myself apply one-off
hacks. I'm sure we'll get there.

I am however, going to continue on and respond to some
of your comments, because I'm not convinced by some of
them, and I'd like to explain why, so that if something is
done later, this stuff would have been discussed.

I also have a question: I'm warming to your suggestion
about how we might just refuse to look for the pdmp files
for symlinks, and instead use only the realpath basename
for those cases. I think that could be a  nice simplification
that might be smaller and safer than what is currently on
the table, and which does not add an addition load, hence
no slowdown. If I were to put together a patch to do that,
would you have any interest in looking at it?


On 4/18/21 1:55 AM, Eli Zaretskii wrote:
Then place a symlink emacs.pdmp there, and have the actual pdumper
file where you want it, under any name you want.  Or move/copy the
emacs-* executables to another directory, make the 'emacs' symlink
resolve to those emacs-* files, and have the pdumper files in the same
place.  Or configure with a different value of $libdir when you build
each emacs-* variant, and then have the corresponding emacs.pdmp file
in the directory under /usr/lib that is private to that variant.  Or
use the --dump-file command-line option.

There are many possible solutions that already work, so why insist on
something that doesn't work?  That it happened to work with unexec is
just sheer luck: the upstream Emacs project never explicitly supported
such configurations.

We seem to have a basic difference of opinion about /usr/bin.
I don't think it's OK for programs to drop their data files there,
and I can't think of any significant other examples of programs that
do. You say this as if it's a normal answer, but it seems odd and
atypical to me. So I'm not going to put anything in /usr/bin other
than exectables, or symlinks that point at executables.

To be honest, I'm a bit surprised to be the first person to
bring this up, so either I'm alone in thinking it's wrong to
put data files in /usr/bin, or I'm just early. Time will tell
I suppose. I do think putting the pdmp files next to the executable
is a fine answer for other places, particularly in the emacs build
tree. But it doesn't make sense for /usr/bin, to me.

About the idea of moving the binaries out of /usr/bin, where we
could add the pdmp files, the problem there, is that we want users
have all those names in their PATH. Let me explain, illustrated
by this excerpt from the original message:

     % cd /usr/bin
     % ls -alFh emacs*
     lrwxrwxrwx   1 root   root     9 Apr 14 22:15 emacs -> emacs-gtk*
     -r-xr-xr-x   2 root   bin  7.05M Apr 14 22:15 emacs-gtk*
     -r-xr-xr-x   2 root   bin  7.05M Apr 14 22:15 emacs-gtk-27.2*
     -r-xr-xr-x   2 root   bin  6.09M Apr 14 22:15 emacs-nox*
     -r-xr-xr-x   2 root   bin  6.09M Apr 14 22:15 emacs-nox-27.2*
     -r-xr-xr-x   2 root   bin  7.07M Apr 14 22:15 emacs-x*
     -r-xr-xr-x   2 root   bin  7.07M Apr 14 22:15 emacs-x-27.2*
     -r-xr-xr-x   1 root   bin    47K Apr 14 22:15 emacsclient*


The intent here is that users who explicitly want the GTK version
will type 'emacs-gtk' or 'emacs-gtk-27.2'. The story is similar for
the Lucid (emacs-x), or pure tty (emacs-nox) versions. Users who
don't care, and just want to run something reasonable run 'emacs'.
Moving those binaries elsewhere would let us put pdmp files next
to them, yes, but since they won't be in anyone's PATH, it's not
very useful.



A related idea that's been floated before would be for the
executable to carry the default dump data within itself.

That idea didn't fly because it meant we again need to comply to
various binary formats, which change with time out of our control.
We'd eventually get into the same trouble as with unexec: the
corresponding developers will refuse supporting the tricks we play for
that to work, exactly as glibc dropped support for malloc hooks we
needed to support unexec.

More generally, that doesn't solve the general problem of how Emacs
finds files it needs to start.  Even if the dump data is in the
executable itself, there could be other files that are similarly
needed at startup.  We already have that with the native-compilation
feature: the *.eln files produced from the preloaded Lisp packages
need to be located at startup, otherwise Emacs will be unable to
start.  We cannot possibly put everything inside the Emacs executable,
even if we wanted to.

Well, I did say that I wasn't suggesting we go there, and
I agree that we don't want to. It's not a given though that
things must become a mess like unexec. Unexec was a mess
because the approach is inherently messy.


But realpath(argv[0]) can produce to a file in another directory,
because realpath expands all the symlinks, not just that of the
basename.  Does it make sense to look up the .pdmp file in the
directory of the original argv[0] when it is a symlink?

It's an interesting question, and I think can be argued
either way.

Exactly.  So who's to say which way is TRT?  Whatever we decide, there
could be another distro out there which will argue that the opposite
makes sense because "it worked for them until now".


I'd say, people at the top, like yourself ultimately decide, just
as you do with many other things that various folks might second
guess later. My point was that I really don't think it matters in this
case, because both outcomes are defensible. Just pick the one you prefer
and document it.

As I mentioned above though, I'm really warming to
this "no" option, as it solves the problem, is simple
to explain, and doesn't add any additional loads.


I can imagine a scenario where it might be useful to
say "yes". It might offer a pretty slick way for end users
to create arbitrary pdmp files and associate them to specific
purposes. Suppose for instance that I want to use a special 'X'
dump file when working on "Project X" code. I could create a special
name for that emacs variant as a symlink to the basic emacs-gtk
in my personal bin:

     % ln -s /usr/bin/emacs-gtk ~/bin/emacsX

Then, if I were to create ~/bin/emacsX.pdmp, and if emacs were
willing to see it as a pdmp file to be loaded, then I could
run my special emacsX, and get the standard emacs (from the
symlink) using my specialized X pdmp.

We support the --dump-file command-line option for this purpose: using
that you can have the pdumper file under any file name you want, all
you need is a shell script or an alias that would add that option.

I think that's a good answer. And, it's also possibly how we might
settle the  "Who's to say" question posed above. If we decide not
to load a pdmp file based on the name of a symlink, then the fix
becomes a matter of simply looking for the realpath basename in
PATH_EXEC, where we currently look for the given basename.
The number of possible loads remains the same as before,
and debates about about slowdowns become moot.


> And if you are thinking about trying both, then (a) there's still the
> question of order (which could affect the correctness), and (b) it
> makes the startup slower, and soon enough people will start
> complaining about that.
...

The reverse question is, what harm does it do to look in PATH_EXEC
for both names?

See above: it makes startup slower, and also runs the risk of picking
up the wrong pdumper file and failing the startup altogether.


I'm not buying that this makes startup slower, and there
are 2 layers to my reasoning.

The first layer is that operating systems put a lot of effort
into making stat() on local files cheap. Anything that does
path searching like shells, or like emacs when it searches
for lisp files, relies on this. Certainly, there's often a
cache involved as well, but those cases do many lookups,
rather than the 2 we currently do, or the additional one
(making 3) that I'm suggested. You can measure the cost of
this added stat(), but you'll never feel it.

The second layer is that we're talking about the stage where
we start looking at PATH_EXEC. The PATH_EXEC stage is a backstop
that is only run when the --dump-file command-line option was
not used, and no pdmp file is found next to the executable. So
in the world that follows your advice of using those features,
the PATH_EXEC stage never runs, and costs 0.

If we do reach the PATH_EXEC stage, and we fail to find a pdmp
file, then the next thing that happens is that emacs will
proceed to search for, compile, and load, numerous elisp files,
spewing their names to stdout as it goes. The cost of this is
definitely felt, unlike the attempt to open the realpath basename
version of the pdmp file, which if successful, will prevent this
expensive outcome.

So now, let's think about the issue of finding the wrong
pdumper file. I'm not sure I see how this can happen. The
PATH_EXEC directory isn't a place where emacs users put
arbitrary content. The names found here correspond to the
names that emacs is installed under on the system. If the
user invents their own emacs name (e.g. myemacs), then there
will be no file in PATH_EXEC for them to accidentally load.
And if they run emacs under one of the installed names, then
they're going to find the right file.

One point I'd make here is that your suggestion that
we not chase pdmp files for symlinks used to run emacs
really simplifies this, because then the only names we'll
ever look for in PATH_EXEC are those of the actual
installed binaries, and assuming the binary names and
pdmp names match, there can be no mixups.


This is not enough, if we want to support *.pdmp files that have
arbitrary names.  For example, when Emacs is invoked as "../emacs" (or
any other relative file name which includes slashes), we currently
don't expand symlinks, so with your proposal "emacs" and "../emacs"
will behave differently.

I'm not sure I understand. I have the proposed bits installed
on my desktop right now, and this does work as I expect.

      % cd /usr/bin
      % ../bin/emacs

As does

      % emacs

That's because you are running Emacs installed, so it looks for the
pdumper file in the hardcoded place under PATH_EXEC, no matter what.
I was alluding to the case that you run Emacs uninstalled, when the
pdumper file is in the same directory where the Emacs binary lives.

In the case where emacs is uninstalled and the pdumper file is
next to it, we never look in PATH_EXEC, so my patch, which
alters that code, is irrelevant.


I don't see any code in load_pdump() that special cases
the case that includes slashes

Look in load_pdump_find_executable, and you will see it.

I do see it, thanks. But note that load_pdump() calls
realpath() on the result from load_pdump_find_executable(),
and so, both 'emacs' and '../bin/emacs' yield the same
absolute path (e.g. /usr/bin/emacs-gtk) in either case,
and my patch sees the same string in either case.


Having said all of the above, since we are currently working on
related issues on the native-compilation branch, it is possible that
we eventually will teach Emacs to support also the arrangement you
want to work in your case.  But I make no promises, and in any case
this will not hit the street before Emacs 28.1, which is probably
still a year or more in the future.  We don't expect another 27.x
release, and even if there is such a release, it will probably be to
fix some very grave bug, so unsuitable for extending existing
features.  So it's your call whether to wait for Emacs 28 in the hope
that maybe it fixes your problem, or redesign your deployment now to
use some arrangement that already works.


OK, sounds good. Thanks.

- Ali



reply via email to

[Prev in Thread] Current Thread [Next in Thread]