[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Lightning] Lightning extensions
From: |
Paulo César Pereira de Andrade |
Subject: |
Re: [Lightning] Lightning extensions |
Date: |
Mon, 20 Sep 2010 23:42:33 -0300 |
Em 11 de setembro de 2010 09:39, Paolo Bonzini <address@hidden> escreveu:
> 2010/9/5 Paulo César Pereira de Andrade
> <address@hidden>:
>> o Some kind of peephole optimization. This is kind tough,
>> and may become quite complex. Again, talking about
>> x86*, it could convert some "ldr_x r0 r1; alur_x r0 r2 r0"
>> into something more like "alum_x r0 r2 r1", where the
>> "m" modifier stands for (m)emory. To have a simple
>> peephole optimization it would require a different
>> approach to handle labels and patches, and that could
>> become very memory hungry, at it would most likely
>> need to store operations in lists, up to jit_flush_code(),
>> so, maybe the idea should be discarded by default...
>
> Yeah, this is a bit against the very idea of lightning and (before it) vcode.
>
> Everything else can be done.
>
> I'll try to review your patches soon!
Well, I do not expect the patches to be review as is, as there are
some patches that are basically complete rewrites, e.g. the "champion"
should be
$ git show 8aaaee|wc -l
9787
but there are others, like 4k lines changes (the above one was adding
the extra jit_state_t argument to core*.h and fp*h inline functions).
Nevertheless, if there are chances of these changes go upstream,
please let me know what can be done. But my major goal was to
have it more usable for my own project, e.g. I use gmp/mpfr/mpc
for multiple precision because it would require a lot more of work
to use/extend/update a bignum library I wrote several years ago.
I just did not expect all the issues I found with lightning, otherwise
I would probably write my own (sorry but truth must be told :-)...
The "spotlights" of i386/x86_64 the code I worked so far
basically does:
o work on x86_64, including calls to, and jit functions, with
any number of arguments; varargs C functions included,
e.g. can call printf, but it uses System V abi, would need
adjustments for Windows at least
o use sse on i386, if available, and use st(0) for return values
following ABI.
o more optimized code generation, avoiding use of stack
when it can use the destination register as "temporary"
o work with all register combinations; there were a few
jit_xyz that would work only with some register
combinations
The "incompatible" changes:
o use of jit_gpr_t and jit_fpr_t, and jit_state_t as described
in another email; incompatible in the sense that it may
require like one or 2 lines patches on code that generates
jit in multiple buffers, and uses more obscure lightning
features
o Added the undocumented jit_rintr_{f,d}_{i,l} interfaces,
and changed jit_roundr_{f,d}_{i,l} to round ties away from
zero, and updated the others to not add/sub one if the
float argument is NaN,[+-]Inf or just does not fit in an
integer, that is, it keeps the 0x80000000 or
0x8000000000000000 result on those cases, what usually
is handy, for example, in my language I treat the later
specially, and use it as an overflow flag, and reconstruct
an operation using mpz_t functions (e.g. mpz_set_d).
o Changed/implemented for i386/x86_64 the undocumented
jit_abs_{f,d} and jit_sqrt_{f,d} to jit_absr_{f,d} and jit_sqrtr_{f,d},
that is, added the "r" modifier to follow the pattern of other
calls.
o It adds a jit_get_cpu function, that uses gcc's constructor
attribute, to call a function before main, that basically evaluates
the "cpuid" opcode with the proper %eax or %rax value. The
function may be explicitly called also, as it has a static
variable to prevent multiple calls, so, it can be explicitly
called if not using gcc; this also adds the undocumented
bitfield globals "jit_cpu" and "jit_flags" for i386/x86_64;
but these should only be read/modified for debugging
purposes. jit_flags actually is of interest, and currently
has only one field, that tells if it can assume round to
nearest, or must adjust the fpu rounding mode on
float to integer conversions.
My next goals:
o Review the sse code to not generate sse2 instructions
on sse/mmx only cpus, like is done in fp-x87.h, where
it has an i386 and an i686 path. There is already some
new, untested, sse4.1 code generation.
o Possibly somewhat big changes that unfortunately I
can only test on i386, but, it is really desirable to have
the low and high word of a multiplication; if it overflows,
it would already have the values to setup an mpz in
any language (like mine) that converts overflow of
integer math into bignums.
o Work on sin/cos/tan/exp/log etc using the x87, even
for x86_64. Not needing to call the libm function means
the JIT_FPR(n) registers contents are unlikely to be
destroyed; issue should be somewhat small, usually
just ensure the argument is in the 2^-64 to 2^+64
range as required by the fpu, but ensuring correctness
for all input may require significant work.
o Work on complex double (and float while at it) using
sse. For non sse it would require 2 registers, so, for
the sake of compatibility, probably better to require
2 sse registers for complex number operations as
well.
o Possibly work on "long long" for 32 bits. Mainly
because it should be somewhat simple compared
to other ideas, but mostly because in my language
I use it as the "generic" untyped integer value (a 64
bits value), that is converted to an mpz_t on overflow,
but having the overflow information directly in the jit
would mean a lot less of costly tests for every integer
operation.
Also, I will probably merge the "work" branch in "master"
of my git-hub fork soon.
> Paolo
[This email is not as long as the patches I have for lightning :-)]
[Almost replied only to Paolo again, gmail "Reply to All" lies...]
Thanks,
Paulo