bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le

From: Paul Eggert
Subject: bug#33174: 27.0.50; Dump fails on GNU/Linux ppc64le
Date: Mon, 29 Oct 2018 22:58:19 -0700
Thomas Fitzsimmons wrote:
BTW, let me know if you don't think it's useful to debug this further.
I'm OK just disabling randomization when I build Emacs for the time
being and waiting until the portable dumper work lands, but I'm happy to
continue if you think it will lead to a general fix.

It's not clear when the portable dumper will land; it might not ever land, unfortunately. So I would like to work on bug#33174 a bit longer, if only so that we can put something intelligible into the PROBLEMS file.

It seems like it's crashing when trying to memcpy over the BSS area, on
this line in unexelf.c (see below):

By the time the memcpy is run the damage has already been done: the memory layout is messed up and we can't fix that simply by passing different arguments to memcpy. We have to prevent the memory layout from being messed up in the first place by disabling undesirable address space layout randomization and doing this very early in execution.

The key question for me is in this set of system calls:

58215 personality(0xffffffff)           = 0 (PER_LINUX)
58215 personality(PER_LINUX|ADDR_NO_RANDOMIZE) = 0 (PER_LINUX)
58215 personality(0xffffffff)           = 0x40000 (PER_LINUX|ADDR_NO_RANDOMIZE)
58215 brk(NULL)                         = 0x27070000
58215 dup2(0, 0)                        = 0
58215 dup2(1, 1)                        = 1
58215 dup2(2, 2)                        = 2

Surely the call to disable_address_randomization () must have returned true, but can you verify that, either via GDB or (shudder) by inserting print statements?

Also, the call from 'main' to getenv ("EMACS_HEAP_EXEC") must have returned NULL. Can you also verify this?

And it appears that 'main' must have called xputenv ("EMACS_HEAP_EXEC=true") and execvp (argv[0], argv). But how can this be, since there's no execve syscall? This is the heart of the mystery, and we can find out more about it by using GDB to put breakpoints on 'personality', 'getenv', 'xputenv' and/or 'execvp' and seeing what's going on. Something like this, perhaps:

$ gdb temacs
(gdb) set disable-randomization off
(gdb) b personality
(gdb) b getenv
(gdb) b xputenv
(gdb) b execvp
(gdb) r --batch  --load loadup bootstrap

and seeing which of these functions get executed in what order, and what they return.

