[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Lightning] About work on possible mips port
From: |
Paulo César Pereira de Andrade |
Subject: |
Re: [Lightning] About work on possible mips port |
Date: |
Wed, 6 Oct 2010 06:10:40 -0300 |
Em 6 de outubro de 2010 04:39, Paolo Bonzini <address@hidden> escreveu:
> On 10/06/2010 06:31 AM, Paulo César Pereira de Andrade wrote:
>>
>> At first I am thinking about using the approach I used, but reverted to
>> not make default, for function calls, that is to patch a stack pointer
>> adjustment on demand, i.e. as result of jit_allocai or function calls
>> that require arguments on stack. There is no push/pop opcodes
>> anyway, but would need to assert that no more than 32767 bytes
>> are allocated, to make patching possible.
>
> I initially said I don't like the idea of patching the prolog on jit_allocai
> or function calls, however I now noticed that the PowerPC port is doing that
> for jit_allocai. So, for that case I now think patching the prolog is good.
For allocai I could see no reason to not patch the prolog, as one cannot
expect reliable results if jumping from/to code with different jit_prolog calls,
unless calling jit_allocai with the same value after every jit_prolog. This
is why I kept patching the stack adjustment, but only for jit_allocai.
For function arguments, to use the alternate logic it now requires something
like:
#if defined(jit_push_pop_p)
jit_flags.push_pop = 0;
#endif
> For function calls, however, it would be much better (and portable) to
> adjust the stack pollution support so that instead of
>
> push %eax
> push %ebx
> call f1 ; jit_calli
> push %eax
> push %ebx
> call f2 ; jit_finish
> add $16, %esp
Question: You mean
jit_prepare(4);
jit_pusharg_i(_RAX);
jit_pusharg_i(_RBX);
jit_calli(f1);
jit_pusharg_i(_RAX);
jit_pusharg_i(_RBX);
jit_finish(f1);
or direct calls to jit_pushr_i and jit_addi_p?
> it generates
>
> push %eax
> push %ebx
> call f1
> mov %eax, 4(%esp)
> mov %ebx, (%esp)
> call f2
> add $8, %esp
jit_calli and jit_callr assumes calling a function with
zero arguments, or that the programmer knows what
he is doing and will push arguments and restore state
correctly.
For better explaining, in the testing tool, this script:
-%<-
.data 32
i:
.c "%d\n"
ii:
.c "%d %d\n"
iii:
.c "%d %d %d\n"
.code 256
prolog 0
movi_i %v0 0
movi_i %v1 1
movi_i %v2 2
movi_p %r0 i
prepare 2
pusharg_i %v0
pusharg_i %r0
finish @printf
movi_p %r0 ii
prepare 3
pusharg_i %v1
pusharg_i %v0
pusharg_i %r0
finish @printf
movi_p %r0 iii
prepare 4
pusharg_i %v2
pusharg_i %v1
pusharg_i %v0
pusharg_i %r0
finish @printf
ret
-%<-
and running it:
$ ./lightning x.tst -v
i:
0x9d1a160 25 64 0a 00
ii:
0x9d1a164 25 64 20 25 64 0a 00
iii:
0x9d1a16b 25 64 20 25 64 20 25 64 0a 00
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0x9d1a248 push %ebx
0x9d1a249 push %esi
0x9d1a24a push %edi
0x9d1a24b push %ebp
0x9d1a24c mov %esp,%ebp
0x9d1a24e sub $0x1c,%esp
0x9d1a254 xor %ebx,%ebx
0x9d1a256 mov $0x1,%esi
0x9d1a25b mov $0x2,%edi
0x9d1a260 mov $0x9d1a160,%eax
0x9d1a265 mov %ebx,0x4(%esp)
0x9d1a269 mov %eax,(%esp)
0x9d1a26c call 0xb76a5050 # @printf
0x9d1a271 mov $0x9d1a164,%eax
0x9d1a276 mov %esi,0x8(%esp)
0x9d1a27a mov %ebx,0x4(%esp)
0x9d1a27e mov %eax,(%esp)
0x9d1a281 call 0xb76a5050 # @printf
0x9d1a286 mov $0x9d1a16b,%eax
0x9d1a28b mov %edi,0xc(%esp)
0x9d1a28f mov %esi,0x8(%esp)
0x9d1a293 mov %ebx,0x4(%esp)
0x9d1a297 mov %eax,(%esp)
0x9d1a29a call 0xb76a5050 # @printf
0x9d1a29f leave
0x9d1a2a0 pop %edi
0x9d1a2a1 pop %esi
0x9d1a2a2 pop %ebx
0x9d1a2a3 ret
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0
0 1
0 1 2
now, if I change it to:
-%<-
.flags push_pop 1
.data 32
i:
.c "%d\n"
ii:
.c "%d %d\n"
iii:
.c "%d %d %d\n"
.code 256
prolog 0
movi_i %v0 0
movi_i %v1 1
movi_i %v2 2
movi_p %r0 i
prepare 2
pusharg_i %v0
pusharg_i %r0
finish @printf
movi_p %r0 ii
prepare 3
pusharg_i %v1
pusharg_i %v0
pusharg_i %r0
finish @printf
movi_p %r0 iii
prepare 4
pusharg_i %v2
pusharg_i %v1
pusharg_i %v0
pusharg_i %r0
finish @printf
ret
-%<-
the output is:
$ ./lightning x.tst -v
i:
0x8db3160 25 64 0a 00
ii:
0x8db3164 25 64 20 25 64 0a 00
iii:
0x8db316b 25 64 20 25 64 20 25 64 0a 00
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0x8db3248 push %ebx
0x8db3249 push %esi
0x8db324a push %edi
0x8db324b push %ebp
0x8db324c mov %esp,%ebp
0x8db324e sub $0xc,%esp
0x8db3254 xor %ebx,%ebx
0x8db3256 mov $0x1,%esi
0x8db325b mov $0x2,%edi
0x8db3260 mov $0x8db3160,%eax
0x8db3265 sub $0xc,%esp
0x8db3268 mov %ebx,(%esp)
0x8db326b push %eax
0x8db326c call 0xb75f2050 # @printf
0x8db3271 add $0x10,%esp
0x8db3274 mov $0x8db3164,%eax
0x8db3279 sub $0x8,%esp
0x8db327c mov %esi,(%esp)
0x8db327f push %ebx
0x8db3280 push %eax
0x8db3281 call 0xb75f2050 # @printf
0x8db3286 add $0x10,%esp
0x8db3289 mov $0x8db316b,%eax
0x8db328e push %edi
0x8db328f push %esi
0x8db3290 push %ebx
0x8db3291 push %eax
0x8db3292 call 0xb75f2050 # @printf
0x8db3297 add $0x10,%esp
0x8db329a leave
0x8db329b pop %edi
0x8db329c pop %esi
0x8db329d pop %ebx
0x8db329e ret
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0
0 1
0 1 2
First one is 92 bytes, and second one is 87 bytes, and the
testing tool defaults to use my change to not push_pop (first
one), but default behavior is using push_pop to avoid the risk
of having problems with code that jumps from code after
different jit_prolog calls.
But I removed support for stack pollution in the sense of
only allowing one jit_prepare() call before a jit_finish(), and
added assertions that the jit_pusharg_x calls match what
was "declared" in jit_prepare, jit_prepare_f and jit_prepare_d.
>> 1. move pointer to register and to indirect jump/call
>
> That's what jit_calli does for PowerPC.
I am mostly unsure because it could generate worse code on
purpose. Could expect user explicitly doing:
jit_movi_p(JIT_R0, pointer_from_anywhere);
jit_jmpr(JIT_R0);
instead of doing that without alternative, or, have some option
for it. jit_movi_p will already have a "complex" patching schema,
because it requires two instructions; one to load the top 16
bits and another for the bottom ones. And it will need to read
the opcode before the patch to decide the kind of the patch
for unconditional ones also...
>> Conditional branches must be limited to 18 bits distance, or, add
>> a jump over with an inverse condition and implement an unconditional
>> one...
>
> I think this is fine. It's 256 KB after all.
In bytes, in instructions it is 64 Kb :-)
>> Still also need to find out how to properly implement carry
>> primitives, as there is only add/sub without side effects, and
>> signed add/sub that generate a trap on overflow.
>
> Something like this:
>
> add destlo, src1lo, src2lo
> sltu aux, destlo, src1lo ; sgtu for subtraction
>
> add desthi, src1hi, src2hi
> add desthi, desthi, aux
Thanks. Using traps probably would be a very bad idea, and
it also means the registers are not updated if an overflow would
happen...
> Paolo
Paulo