lightning
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lightning] Weird bug?


From: Paulo César Pereira de Andrade
Subject: Re: [Lightning] Weird bug?
Date: Wed, 5 Mar 2014 16:01:21 -0300

2014-03-05 12:40 GMT-03:00 Bruno Loff <address@hidden>:
> (sorry accidentally pressed send before I finished writing)
>
> Hello,

  Hi,

> I am using gnu lightning to make a compiler for register machines (each
> register is a 64bit word, say). I would like to have many of these register
> machines working simultaneously, and throughout time I am interested in
> adding new register machines to memory (perhaps many thousands of them),
> destroying & deallocating old register machines (again, by the thousands).
>
> First thing I noticed was that lightning allocates a full page (4k of
> memory, or whatever) whenever it wants to emit a function, and that it had

  jit_destroy_state frees all memory, but is supposed to be used when
a code buffer is not going to be use again.

> no way to erase a given piece of code from memory. This wouldn't do, so I
> decided to try and do the following:
>
> Use lightning to generate the code;
>
> find the code length by surrounding the code by start = jit_node, and end =
> jit_note, and then do:
>     length = jit_address(start) - jit_address(end);
>
> Allocate as large block of memory as I need, with read, write and exec
> permissions;
>
> then copy the code by memcpy(_my_memory_block_code_start,
> jit_address(start), length);
>
> then I cast  _my_memory_block_code_start as a function, and call it.
>     void (*entryPoint)(void *code_address, void *registers_address);
> entryPoint = _my_memory_block_code_start;
>     entryPoint(_my_memory_block_code_start,
> _my_memory_block_registers_start);
>
> I call it with a pointer somewhere else in the same block I allocated
> earlier, and the machine will use that same block to store the registers;
>
> Now the advantage of this is when I want to delete some register machine, I
> can simply erase it and reuse the space.
>
> Of course, the compiled code needs to have to access to the registers of the
> machine, and it needs to be able to jump around.
>
> I could have solved the register access by allocating the register space
> before jit_emit'ing the code, but how was I going to do the jumping around?
>
> I was hoping that the following would work:
>
> whenever the register machine wanted to jump to some jit label, call it
> LABEL; what I would do is to store in one of the registers the value:
> jit_address(LABEL) - jit_address(start),
>
> i.e., I store somewhere the difference between where the code starts and
> where the label is.
>
> Then, instead of having an instruction
>
> jmpi <address of LABEL>,
>
> I would have
> R1 = code_address + <value of register holding jit_address(LABEL) -
> jit_address(start)>
> jmpr R1
>
> I wasn't sure this was going to work, but my first few tests seemed to do
> OK. And then I came up with a thoroughly weird bug.
>
> I have basically came to the following situation. There is a place in my
> program where I call the instruction:
>
> jit_movi(JIT_R0, 0x0b);
>
> The contents of R0 are then stored in memory by call to jit_stxi. After the
> program and its lightning-compiled code are executed, I print the contents
> of the memory, and there it is! The bytes 0b 00 00 00 00 00 00 are at the
> correct position, as expected.
>
> Then I leave EVERYTHING THE EXACT SAME, except for this single instruction
> call, which I replace with:
>
> jit_movi(JIT_R0, 0x0a);
>
> And then if I print the contents of the memory, instead of the expected 0a
> 00 00 00 00 00 00 00, there will be some weird number, like: A1 70 8F 01 00
> 00 00 00 !!!! (the number will change with different executions)
>
> The bug is really frail and I was really lucky to find it... for instance:
>
> If I do it with 0x0c instead of 0x0a, it will work again.
>
> If I replace the above instruction with
>
> jit_movi(JIT_R0, JIT_R1);
> jit_movi(JIT_R0, 0x0a);
>
> then it will work again!!!! (i.e., 0a 00 00 ... appears in the correct
> position). This was quite weird, since the effect of jit_movi(JIT_R0,
> JIT_R1) is effectively nullified by the following instruction.
>
> Any ideas of what the problem might be? I wonder if it is a problem of code
> alignment or something? Is it a result of copying the bytes somewhere else?
> Is it fixable?

  Only looking at the actual code to have an idea if it is a lightning bug.
Did you compile lightning by yourself or are using some pre built package?
I suggest testing with lightning built with --enable-assertions.
--enable-disassembler would be useful for checking the generated code.

  Note that the most common mistake is using registers instead of
immediates or vice-versa, as in the example above (that it is a noop
so would not have problems) should have been jit_movr(JIT_R0, JIT_R1)

  Try running the code under valgrind to ensure it is not accessing released
memory.

  Under gdb, add a break to where the memory is stored to ensure the
proper value is stored, then a watch to the memory if it is being changed
by code somewhere else.

  There are some useful tricks under gdb to debug the jit, for example,
assuming lightning built with --enable-disassembler, and assuming a
not so small generated code, rebuild the code generator adding
something like:

jit_note("foo", 1);
jit_movi(JIT_R0, 0xa);
...
jit_note("foo", 2);
jit_stxi(OFFSET, BASE, JIT_R0);

then:

(gdb) b <<somewhere before execution but after jit_emit>>>
(gdb) r > /tmp/log.txt
...
(gdb)

in another terminal, edit /tmp/log.txt and search for :foo:1 and :foo:2,
back to the gdb prompt, add breakpoints, e.g.:

(gdb) b *0x123455  <-- :foo:1 instruction
(gdb) b *0x123456  <-- :foo:2 instruction
(gdb) c

in the first breakpoint, check that the instruction is correct, e.g.

mov    $0xa,%eax

this should not work on all architectures, but on x86 you could watch
the register value, e.g:

(gdb) watch $rax

but your problem looks like memory being overwritten, so, watching the
memory should be good enough, e.g. figure out the proper address in the
second breakpoint, e.g. assuming something like:

jit_stxi(OFFSET, BASE, JIT_R0);
being translated to
mov %rax,0x8(%rbx)

(gdb) p $rbx+8
0xabcdef
(gdb) watch *(long*)0xabcdef

if it is jit code overwriting it may not be easy to debug, and you may
need to add more jit_note calls to help locate the place where code
overwriting the memory is being generated.

> Thank you,
> Bruno

Thanks,
Paulo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]