[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Lightning] Re: Some questions about minor changes

From: Paulo César Pereira de Andrade
Subject: [Lightning] Re: Some questions about minor changes
Date: Fri, 24 Sep 2010 06:49:55 -0300

Em 24 de setembro de 2010 05:54, Paolo Bonzini <address@hidden> escreveu:
> On 09/24/2010 10:16 AM, Paulo César Pereira de Andrade wrote:
>>   About the questions:
>> o Is it a problem to use byte or word opcodes, i.e. will it cause some
>> kind
>>   of register stall, or something, like setting %al or testing %ax, when
>> only
>>   that is used, e.g. setting %al as "hidden" counter of xmm registers used
>>   when calling a varargs function, or test %ax after fnstw?
> I think it's good as long as you read the small register after possibly
> writing the large one.  Writing the small register and reading the large one
> later has performance penalties, instead.

  I think it is ok then, as the abi specification says it only reads %al
(or only cares about %al) when "parsing" the varargs, to know the number
of float arguments in xmm registers, and for prototyped functions, it
should write to %rax without reading it before at some point.

>> o Would it be a good idea to either add new interfaces, or change the
>>   existing one? e.g. jit_prepare and jit_prolog would be better with 3
>>   arguments,to properly calculate the stack offsets, when keeping the
>> stack aligned at
>>   16 bytes, otherwise, it requires a not so clear logic, to "intercept"
>> the
>>   first call to jit_pusharg_t, jit_arg_t and/or jit_allocai.
> Can you explain exactly why?

  Example of calling a function:
prepare 7           <- use 6 gp registers and 1 stack slot
prepare_f 3        <- use 3 fp registers
prepare_d 9       <- use 5 remaining fp registers and 4 stack slots
finish function

in the first pusharg, it must first pad the stack to align it to 16 bytes,
but if the number of stack slots is even, it does not need.

  The i386 code does a "defensive" align the stack on jit_prolog. I
updated it to use the code that was OS/X specific, as it is required
for Linux also if using sse, and there may be code that takes the
address of stack variables as sse opcode arguments.
  The x86_64 code now does it on first push* or first allocai, because
it also needs a prolog_{f,d} to figure out arguments on stack.

  The reason is that it assumes it first assumes it was called with
the stack aligned, and then, it must also align the stack for functions
it calls. I am not sure what else OS/X does, but the 16 bytes is almost
for sure due to sse* constraints. So far I only had this issue due to
using stack variables as arguments to the gcc simd abstraction,
and when doing a scalar op vector using a stack temporary, for
example, code somewhat like:

void scalar_add_vector_c_cv(v16sc *u, char v, v16sc *w, int m) {
    v16sc cc; char *p = (char*)&cc;
    while (n--) *u++ = cc + *w++;

>>   I understand that, to some extent, I am adding significant extra
>> complexity to lightning, by adding support to any number of arguments
>> for example.
> No, your work is much appreciated, even though I'm wondering myself if you
> wouldn't have made your life easier by using LLVM...

  Possibly, but then I would miss most of the fun :-) But LLVM as far as I
understand, is C++ and is meant to statically typed languages, but for
sure it should have some way to "plug" logic to convert basic types,
e.g. jump to another code path when an int needs to be converted to
a mpz_t, or a double to a complex double.

> Paolo


reply via email to

[Prev in Thread] Current Thread [Next in Thread]