[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lightning] More on work on lightning

From: Paolo Bonzini
Subject: Re: [Lightning] More on work on lightning
Date: Mon, 27 Sep 2010 11:41:43 +0200
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20100907 Fedora/3.1.3-1.fc13 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.3

On 09/26/2010 01:15 PM, Paulo César Pereira de Andrade wrote:
Em 26 de setembro de 2010 04:22, Paolo Bonzini<address@hidden>  escreveu:
2010/9/25 Paulo César Pereira de Andrade
  How to get forward/context information?
2. Add a standard field to jit_state_t or jit_local_state to be filled
   by the programmer

That's possible.  However, I think this does not belong in lightning
at all.  lightning users could do inlining at a high level to ensure
big enough subroutines are generated and the prolog overhead is not
important.  Register allocation could be done at a higher level too,
and so could constant propagation.

   What you suggest to "export" the 6 gpr argument registers and
the 8 xmm argument registers in x86_64? I mean, to do it in a
somewhat standard interface. One could just use the registers,
but maybe add a new JIT_ARG(n) and JIT_FPARG(n) and
JIT_ARG_NUM and JIT_FPARG_NUM to be tested for availability?

What do you mean?

Arguments should be pushed without knowledge of argument registers, even though that's not perfect when you have a register allocator that can do coalescing.

   Since the i386 code always did a "sub 12,%esp", I converted it into
an explicit 32 bit immediate (that is, do not call SUBLir, but inline the
instruction generation to not have it use the 8 bits immediate version),
and patch the value on the fly if need more space, and then, pass
arguments using jit_stxi_x from JIT_SP.

Nice. However, it does not work if you jump from one function to another (skipping the jit_prolog of the latter) to do tail-calling.

   I also updated some code for float/double conversion/load to
make a jit_allocai call "on demand", and then, use jit_{ld,st}xi_x
from JIT_FP.

Have you timed performance? Stack operations are really really cheap on x86.

   That would be an option, probably a very good one, but probably
would break things badly because the same shared object needs
to call gmp/mpfr/X11/etc functions.

regparm uses callee-saved registers for parameters and can be applied per-function. It's not "-ffixed-xxx" which might change the ABI. However, there is a problem in that prepare/pusharg/finish does not understand the regparm calling convention.

   My interest on adding those was because they were just #if 0'ed, but
having a single opcode, possibly with some small tests is tempting, as
it also means registers are saved. But the cost is very high, it is average
well over 100 cycles for sin/cos and some others. Add transfer using
stack and it becomes more costly. I wonder if there will be trigonometric
or transcendental, reliable, functions on x86+sse, as sse I believe (and
actually the name implies :-) is not really mean't for scientific programming,
but for multimedia.

No, that's not true anymore. SSE is just floating-point math done right. There are no trig/transcendental functions because it doesn't really make anymore much sense in modern microarchitectures, and doesn't guarantee correct results so it's tricky to use it.

In code where performance really matters the compiler could do better by vectorizing loop, and calling functions doing a vector sine/cosine/log/exp. However, where hardware helps is with reciprocal and square root, so there are instructions for that in SSE.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]