[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Lightning] More on work on lightning

From: Paulo César Pereira de Andrade
Subject: [Lightning] More on work on lightning
Date: Sat, 25 Sep 2010 18:30:17 -0300


  To have better code generation, it is required to have more knowledge
of certain constraints, like stack usage, flow control, liveness of registers,
aliases, etc.

  Lightning is mean't to be fast, simple and easily retargetable, and in
the work I am doing, I do not want to break this, and avoid as much as
possible breaking the api.

  How to get forward/context information?
1. Add some new call, like jit_hint(...)
2. Add a standard field to jit_state_t or jit_local_state to be filled
   by the programmer
3. Create a new abstraction, that uses lightning after parsing the
   calls. A good candidate, "almost ready to go" for a kind of
   intermediate representation is the pseudo assembler that I
   made to test lightning; lightning.c at

  A good example of use of required forward information is stack
usage in the i386 backend. Instead of using the pattern:

    printf("%d\n", JIT_R0);
    subi_l %sp %sp 12
    str_i %sp %r0
    movi_p %r0 "%d\n"
    pushr_l %r0
    calli @printf
    addi %sp %sp 16

it could have allocated stack in prolog for all calls done in
the function, and the above could have became:
    stxi_i 4 %sp %r0
    movi_p %r0 "%d\n"
    str_p %r0 %sp
    calli @printf

it is even more appealing when there are several sequential
function calls (like in my interpreter that currently uses
lightning mostly to glue calls to C functions).

  This is a place where jit_leaf and jit_prolog would make
a difference, e.g. jit_leaf would work like the current approach,
but jit_prolog could allocate some stack space to use the
approach above. Actually in i386, it could use the 12 bytes
added to pad the stack, instead of just setting initial
alloca_slack for it. But would be better to know if may
call functions with more then 12 bytes on stack, and preallocate
that in jit_prolog.

  Another issue is things like:
    prepare 4
    xorr_i %r0 %r0
    pushr_i %r0
    xorr_i %r0 %r0
    pushr_i %r0
    xorr_i %r0 %r0
    pushr_i %r0
    xorr_i %r0 %r0
    pushr_i %r0
    finish @foo
    addi_l %sp %sp 16

but this requires significant extra information to understand
what is going on.

[the above examples are close to, but do not match the
pseudo assembler syntax, e.g. there is no pushr_x call,
only pusharg_x, neither movi_p can take a literal, need
a label as argument]

  Another somewhat unrelated comments is about the
initial trigonometric functions using x87. Probably they
are not (easily?) available on other cpus, and then, there
is all the problem of correctness. For example, the current
implementation should return the same result as libc 32
bits, but sin/cos at least are basically meaningless with
(actually not so) huge input. One possible way to reduce
the argument would be to use some lookup tables, with
precise multiples of pi at several intervals, e.g. pi*(2^10),
pi*(2^20), etc, because doing something like sin(1e32)
first has 1e32 not represented exactly as 1 followed by 32
zeros, then, the hardware will calculate the remainder
of a 66 bits precision pi (but it appears it was changed
in Pentium and newer), and it will have completely
different results when verified with 32 or 64 bit libm sin,
mpfr, etc, or just when changing the precision to double
precision instead of extended precision, because
that would affect the initial argument.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]