lightning
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lightning] About work on possible mips port


From: Paulo César Pereira de Andrade
Subject: Re: [Lightning] About work on possible mips port
Date: Wed, 6 Oct 2010 06:10:40 -0300

Em 6 de outubro de 2010 04:39, Paolo Bonzini <address@hidden> escreveu:
> On 10/06/2010 06:31 AM, Paulo César Pereira de Andrade wrote:
>>
>>   At first I am thinking about using the approach I used, but reverted to
>> not make default, for function calls, that is to patch a stack pointer
>> adjustment on demand, i.e. as result of jit_allocai or function calls
>> that require arguments on stack. There is no push/pop opcodes
>> anyway, but would need to assert that no more than 32767 bytes
>> are allocated, to make patching possible.
>
> I initially said I don't like the idea of patching the prolog on jit_allocai
> or function calls, however I now noticed that the PowerPC port is doing that
> for jit_allocai.  So, for that case I now think patching the prolog is good.

  For allocai I could see no reason to not patch the prolog, as one cannot
expect reliable results if jumping from/to code with different jit_prolog calls,
unless calling jit_allocai with the same value after every jit_prolog. This
is why I kept patching the stack adjustment, but only for jit_allocai.
For function arguments, to use the alternate logic it now requires something
like:
#if defined(jit_push_pop_p)
    jit_flags.push_pop = 0;
#endif

> For function calls, however, it would be much better (and portable) to
> adjust the stack pollution support so that instead of
>
>         push %eax
>         push %ebx
>         call f1           ; jit_calli
>         push %eax
>         push %ebx
>         call f2           ; jit_finish
>         add $16, %esp

Question: You mean
        jit_prepare(4);
        jit_pusharg_i(_RAX);
        jit_pusharg_i(_RBX);
        jit_calli(f1);
        jit_pusharg_i(_RAX);
        jit_pusharg_i(_RBX);
        jit_finish(f1);

or direct calls to jit_pushr_i and jit_addi_p?

> it generates
>
>         push %eax
>         push %ebx
>         call f1
>         mov %eax, 4(%esp)
>         mov %ebx, (%esp)
>         call f2
>         add $8, %esp

  jit_calli and jit_callr assumes calling a function with
zero arguments, or that the programmer knows what
he is doing and will push arguments and restore state
correctly.

  For better explaining, in the testing tool, this script:
-%<-
.data   32
i:
.c      "%d\n"
ii:
.c      "%d %d\n"
iii:
.c      "%d %d %d\n"
.code   256
        prolog 0
        movi_i %v0 0
        movi_i %v1 1
        movi_i %v2 2
        movi_p %r0 i
        prepare 2
                pusharg_i %v0
                pusharg_i %r0
        finish @printf
        movi_p %r0 ii
        prepare 3
                pusharg_i %v1
                pusharg_i %v0
                pusharg_i %r0
        finish @printf
        movi_p %r0 iii
        prepare 4
                pusharg_i %v2
                pusharg_i %v1
                pusharg_i %v0
                pusharg_i %r0
        finish @printf
        ret
-%<-
and running it:

$ ./lightning x.tst -v
i:
         0x9d1a160      25 64 0a 00
ii:
         0x9d1a164      25 64 20 25 64 0a 00
iii:
         0x9d1a16b      25 64 20 25 64 20 25 64 0a 00
  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
         0x9d1a248      push   %ebx
         0x9d1a249      push   %esi
         0x9d1a24a      push   %edi
         0x9d1a24b      push   %ebp
         0x9d1a24c      mov    %esp,%ebp
         0x9d1a24e      sub    $0x1c,%esp
         0x9d1a254      xor    %ebx,%ebx
         0x9d1a256      mov    $0x1,%esi
         0x9d1a25b      mov    $0x2,%edi
         0x9d1a260      mov    $0x9d1a160,%eax
         0x9d1a265      mov    %ebx,0x4(%esp)
         0x9d1a269      mov    %eax,(%esp)
         0x9d1a26c      call   0xb76a5050 # @printf
         0x9d1a271      mov    $0x9d1a164,%eax
         0x9d1a276      mov    %esi,0x8(%esp)
         0x9d1a27a      mov    %ebx,0x4(%esp)
         0x9d1a27e      mov    %eax,(%esp)
         0x9d1a281      call   0xb76a5050 # @printf
         0x9d1a286      mov    $0x9d1a16b,%eax
         0x9d1a28b      mov    %edi,0xc(%esp)
         0x9d1a28f      mov    %esi,0x8(%esp)
         0x9d1a293      mov    %ebx,0x4(%esp)
         0x9d1a297      mov    %eax,(%esp)
         0x9d1a29a      call   0xb76a5050 # @printf
         0x9d1a29f      leave
         0x9d1a2a0      pop    %edi
         0x9d1a2a1      pop    %esi
         0x9d1a2a2      pop    %ebx
         0x9d1a2a3      ret
  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0
0 1
0 1 2

now, if I change it to:
-%<-
.flags  push_pop 1
.data   32
i:
.c      "%d\n"
ii:
.c      "%d %d\n"
iii:
.c      "%d %d %d\n"
.code   256
        prolog 0
        movi_i %v0 0
        movi_i %v1 1
        movi_i %v2 2
        movi_p %r0 i
        prepare 2
                pusharg_i %v0
                pusharg_i %r0
        finish @printf
        movi_p %r0 ii
        prepare 3
                pusharg_i %v1
                pusharg_i %v0
                pusharg_i %r0
        finish @printf
        movi_p %r0 iii
        prepare 4
                pusharg_i %v2
                pusharg_i %v1
                pusharg_i %v0
                pusharg_i %r0
        finish @printf
        ret
-%<-

the output is:

$ ./lightning x.tst -v
i:
         0x8db3160      25 64 0a 00
ii:
         0x8db3164      25 64 20 25 64 0a 00
iii:
         0x8db316b      25 64 20 25 64 20 25 64 0a 00
  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
         0x8db3248      push   %ebx
         0x8db3249      push   %esi
         0x8db324a      push   %edi
         0x8db324b      push   %ebp
         0x8db324c      mov    %esp,%ebp
         0x8db324e      sub    $0xc,%esp
         0x8db3254      xor    %ebx,%ebx
         0x8db3256      mov    $0x1,%esi
         0x8db325b      mov    $0x2,%edi
         0x8db3260      mov    $0x8db3160,%eax
         0x8db3265      sub    $0xc,%esp
         0x8db3268      mov    %ebx,(%esp)
         0x8db326b      push   %eax
         0x8db326c      call   0xb75f2050 # @printf
         0x8db3271      add    $0x10,%esp
         0x8db3274      mov    $0x8db3164,%eax
         0x8db3279      sub    $0x8,%esp
         0x8db327c      mov    %esi,(%esp)
         0x8db327f      push   %ebx
         0x8db3280      push   %eax
         0x8db3281      call   0xb75f2050 # @printf
         0x8db3286      add    $0x10,%esp
         0x8db3289      mov    $0x8db316b,%eax
         0x8db328e      push   %edi
         0x8db328f      push   %esi
         0x8db3290      push   %ebx
         0x8db3291      push   %eax
         0x8db3292      call   0xb75f2050 # @printf
         0x8db3297      add    $0x10,%esp
         0x8db329a      leave
         0x8db329b      pop    %edi
         0x8db329c      pop    %esi
         0x8db329d      pop    %ebx
         0x8db329e      ret
  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0
0 1
0 1 2

  First one is 92 bytes, and second one is 87 bytes, and the
testing tool defaults to use my change to not push_pop (first
one), but default behavior is using push_pop to avoid the risk
of having problems with code that jumps from code after
different jit_prolog calls.

  But I removed support for stack pollution in the sense of
only allowing one jit_prepare() call before a jit_finish(), and
added assertions that the jit_pusharg_x calls match what
was "declared" in jit_prepare, jit_prepare_f and jit_prepare_d.

>> 1. move pointer to register and to indirect jump/call
>
> That's what jit_calli does for PowerPC.

  I am mostly unsure because it could generate worse code on
purpose. Could expect user explicitly doing:
    jit_movi_p(JIT_R0, pointer_from_anywhere);
    jit_jmpr(JIT_R0);
instead of doing that without alternative, or, have some option
for it. jit_movi_p will already have a "complex" patching schema,
because it requires two instructions; one to load the top 16
bits and another for the bottom ones. And it will need to read
the opcode before the patch to decide the kind of the patch
for unconditional ones also...

>>   Conditional branches must be limited to 18 bits distance, or, add
>> a jump over with an inverse condition and implement an unconditional
>> one...
>
> I think this is fine.  It's 256 KB after all.

  In bytes, in instructions it is 64 Kb :-)

>>   Still also need to find out how to properly implement carry
>> primitives, as there is only add/sub without side effects, and
>> signed add/sub that generate a trap on overflow.
>
> Something like this:
>
>        add destlo, src1lo, src2lo
>        sltu aux, destlo, src1lo        ; sgtu for subtraction
>
>        add desthi, src1hi, src2hi
>        add desthi, desthi, aux

  Thanks. Using traps probably would be a very bad idea, and
it also means the registers are not updated if an overflow would
happen...

> Paolo

Paulo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]