lightning
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Lightning loops indefinitely inside jit_emit on M1 Macs


From: Darren Kulp
Subject: Re: Lightning loops indefinitely inside jit_emit on M1 Macs
Date: Sat, 20 Mar 2021 09:40:29 -0400

Hello again,

I think I have learned that my original problem is that MAP_JIT seems to be required on M1 Macs (at least on my macOS 11.2.2) when combining PROT_WRITE and PROT_EXEC, but there might also be another issue.

I did not originally understand how to build correctly with debugging (since `./configure --help` does not seem to show anything related to debugging), but after I compiled with `./configure --enable-assertions`, I found that the mmap() call was actually failing the first time (with a _jit->code.length of 4096) :

kulp@ego lightning-2.1.3 % DYLD_LIBRARY_PATH=$PWD/lib/.libs ./doc/.libs/rfib
Assertion failed: (_jit->code.ptr != MAP_FAILED), function _jit_emit, file lightning.c, line 2027.
zsh: abort      DYLD_LIBRARY_PATH=$PWD/lib/.libs ./doc/.libs/rfib

I found out that macOS has a MAP_JIT flag for mmap() in order to allow combining PROT_WRITE and PROT_EXEC : 


See also comments in this pull request I found :

When I added MAP_JIT flag like this at the affected mmap() call :

_jit->code.ptr = mmap(NULL, _jit->code.length,
PROT_EXEC | PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANON | MAP_JIT, mmap_fd, 0);

then I no longer saw that assertion. Instead I see a bus error later :

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x1001d0000)
    frame #0: 0x0000000100128ec0 liblightning.1.dylib`_emit_code [inlined] _oxxx7(_jit=0x00000001002069f0, Op=-1451229184, Rt=29, Rt2=30, Rn=31, Simm7=-20) at jit_aarch64-cpu.c:1027:5 [opt]
   1024     i.Rt2.b = Rt2;
   1025     i.Rn.b = Rn;
   1026     i.imm7.b = Simm7;
-> 1027     ii(i.w);

but the debugger seems to get mismatching DWARF info when optimizations are enabled.

(lldb) p i
error: Couldn't materialize: couldn't get the value of variable i: DW_OP_piece for offset 1 but top of stack is of size 9
error: errored out in DoExecute, couldn't PrepareToExecuteJITExpression
(lldb) frame variable
(jit_state_t *) _jit = 0x0000000100304160
(jit_int32_t) Op = -1451229184
(jit_int32_t) Rt = 29
(jit_int32_t) Rt2 = 30
(jit_int32_t) Rn = 31
(jit_int32_t) Simm7 = -20
(instr_t) i = <DW_OP_piece for offset 1 but top of stack is of size 9>


I edited the `configure` to remove `-O2` and rebuilt. Now I get the same bus error but I get more information, which I attached in “debugger-state.txt”.

Attachment: debugger-state.txt
Description: Text document


I did attach some build logs in case they are helpful (these are with -O2 still enabled).

kulp@ego lightning-2.1.3 % ./configure --enable-assertions &> configure.output
kulp@ego lightning-2.1.3 % make V=1 &> make.output

Attachment: make.output
Description: Binary data

Attachment: config.log
Description: Binary data

Attachment: configure.output
Description: Binary data


When I get some more time I will look into this further, since I am sure it is hard for others to debug it with this information.

Darren Kulp

On Mar 13, 2021, at 12:21, Darren Kulp <darren@kulp.ch> wrote:

Thanks for your response. I picked a busy time for me (starting a new job in a new city) so it will take me a bit longer to get back to this than I hoped, but I expect to get you a fuller response within a few weeks.

Darren Kulp

On Mar 11, 2021, at 14:40, Paulo César Pereira de Andrade <paulo.cesar.pereira.de.andrade@gmail.com> wrote:

Em dom., 7 de mar. de 2021 às 19:17, Darren Kulp <darren@kulp.ch> escreveu:

Hello,

 Hi,

Thank you for GNU lightning. It is a great tool and I have appreciated how things generally just work. Right now, I am seeing a rare exception to that rule: when I build GNU lightning 2.1.3 on my M1 Macbook (arm64 architecture), the generated example codes appear to hang inside jit_emit().

The script below shows what I know so far.

curl -O http://ftp.gnu.org/gnu/lightning/lightning-2.1.3.tar.gz
tar xf lightning-2.1.3.tar.gz
cd lightning-2.1.3
CFLAGS=-g3 LDFLAGS=-g3 ./configure && make
DYLD_LIBRARY_PATH=$PWD/lib/.libs ./doc/.libs/rfib &
sleep 5 # arbitrary sleep time to allow to get stuck
lldb -p $(pgrep rfib)


After those commands, an LLDB backtrace shows this :

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
 * frame #0: 0x000000018e6259c8 libsystem_kernel.dylib`__mmap + 8
   frame #1: 0x000000018e625954 libsystem_kernel.dylib`mmap + 52
   frame #2: 0x000000010025b870 liblightning.1.dylib`_jit_emit(_jit=0x0000000127e06970) at lightning.c:2065:23
   frame #3: 0x0000000100237d2c rfib`main(argc=1, argv=0x000000016fbcb940) at rfib.c:47:9
   frame #4: 0x000000018e679f34 libdyld.dylib`start + 4

Stepping through the code, it appears that the code is stuck in the loop starting at line 2033 of lightning.c, and that emit_code() continues to return NULL each time it is called.

 This should only happen if while generating jit, it notices the instruction
pointer in the mmap'ed area would overflow.

 Can you run under a debug environment? It would be very valuable
to know what value _jit.length has in the first call to emit_code. It
should have calculated a sane value, but to enter an infinite loop,
it probably has a negative, and very small value, as it increments
the size in 4k at a time, and tries again. It really should not even
loop, as it should never miscalculate that bad.

 To debug this issue, it should be enough to set a breakpoint in
_jit_emit, then a watchpoint on *(long*)_jit->code.length

 Should also check what value _jit->code.end has, as it might also
be somehow getting an incorrect value, but in all conditions, it
should be due to bad code generation. Can you also share all
build logs? Maybe the compiler its giving some advice of some
issue that my test environment on Linux and gcc did not have.

I noticed this problem first when I tried to use Homebrew to install GNU lightning on my mac. Homebrew has “bottles” (binary distributions) compiled for Intel platforms including macOS Big Sur, but not for M1 Macs as of this writing.

kulp@ego /tmp % brew install lightning
Error: lightning: no bottle available!
You can try to install from source with:
 brew install --build-from-source lightning

When I tried to install using `brew install --build-from-source lightning`, I noticed that the `check` process took 100% CPU for a long time (over an hour), so I guessed it must be in an infinite loop, and tried to build it myself as I had previously done successfully on Intel Macs. That is when I discovered the details I show above.

I would have tried to reproduce the issue on master, but I get stuck with autoconf (my autoconf 2.69 rejects some directives in configure.ac).

In a few days I will regain access to my Intel Mac (with older macOS version of High Sierra instead of Big Sur), for comparison. Until then, can anyone suggest something else I could try in order to narrow things down ?

Darren Kulp

Thanks!
Paulo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]