|Subject:||[Lightning] I need your help :-(|
|Date:||Sun, 15 Jun 2008 10:58:40 -0400|
|I found another problem.|
With the calling convention when using 4 args or more.
Let me show you:
I JIT one of my instruction as follows:
1 jit->movi_i(CJIT_R0,(_nbf+1) * sizeof(ColSlotI));
So... It is a call to a function (generated by lightning a well).
It takes 4 arguments RO/R1 are integers while V0 and R2 are pointers.
Here is the assembly it generates:
(gdb) x/20i $rip
0x100030c359: mov $0x20,%eax
0x100030c35e: xor %ecx,%ecx
0x100030c360: mov $0x10002c3aa8,%rdx
0x100030c36a: mov %rbx,%r8
0x100030c36d: mov %rax,%r9
0x100030c370: mov %rcx,%r10
0x100030c373: mov %rdx,%r11
0x100030c376: mov $0x10002ffff8,%r11
0x100030c380: mov %r11,%rdi
0x100030c383: mov %r10,%rsi
0x100030c386: mov %r8,%rdx
0x100030c389: rex.WB callq *%r11
0x100030c38c: mov %eax,%ebx
it initalizes eax and ecx (instructions 1/2) then loads rdx (R2, instruction 3). Then we get the 4 push args that supposedly load the input registers with the desired value (let's ignore the missing 32-bit to 64-bit conversion for now, I know how to fix that). So lightning first moves the registers into "temporary" ones (starting with R8. So 4 pushargs and we use R8,R9,R10,R11).
Then, the CALL instruction of lightning always emit the (fixed address) into a register with a mov (register R11) which of course destroys what I had in R11 (a.k.a.,rdx). Now the registers are "shifted" back from the temp locations to the input registers. and R11 is written to RDI, R10 to RSI and R8 to rdx. Now we don't move all the registers either (only 3 mov when there are 4 arguments) so something may be off here too.
The most striking issue though is that the use of R11 as a dedicated register to load the address of the callee interferes with functions that have 4 input arguments. There is no provision for that now. It seems that the best lightning can do (safely) is 3 arguments. I looked up the intel documentation and this bit:
indicates that CALL r/m64 should be possible (calling with an immediate that is 64bit wide).
I'm still too fresh on the low-level instruction encoding to fix that though. (Getting rid of the use of R11 as a temp for the callee address).
I figured that we ought to add a macro similar to:
#define CALLQsr(R) (_REXQrr(0, R), _O_Mrm (0xff ,_b11,_b010,_r8(R) ))
with a _REXQrm (rather than rr) but I have no idea how to twiddle the bits of the instruction to state the the operand is a 64-bit wide immediate.
Any help with this greatly appreciated as always!
(I have some patches I'll pass along as well --for small things -- as soon as I'm done with this).
PS/ Getting to 4 args would be enough for me, but lightning has a more flexible API and x86_64 allows an arbitrary number of args and spilling on the stack.
Description: S/MIME cryptographic signature
|[Prev in Thread]||Current Thread||[Next in Thread]|