[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lightning] Lightning extensions

From: Paulo César Pereira de Andrade
Subject: Re: [Lightning] Lightning extensions
Date: Mon, 20 Sep 2010 23:42:33 -0300

Em 11 de setembro de 2010 09:39, Paolo Bonzini <address@hidden> escreveu:
> 2010/9/5 Paulo César Pereira de Andrade
> <address@hidden>:
>> o Some kind of peephole optimization. This is kind tough,
>>  and may become quite complex. Again, talking about
>>  x86*, it could convert some "ldr_x r0 r1; alur_x r0 r2 r0"
>>  into something more like "alum_x r0 r2 r1", where the
>>  "m" modifier stands for (m)emory. To have a simple
>>  peephole optimization it would require a different
>>  approach to handle labels and patches, and that could
>>  become very memory hungry, at it would most likely
>>  need to store operations in lists, up to jit_flush_code(),
>>  so, maybe the idea should be discarded by default...
> Yeah, this is a bit against the very idea of lightning and (before it) vcode.
> Everything else can be done.
> I'll try to review your patches soon!

  Well, I do not expect the patches to be review as is, as there are
some patches that are basically complete rewrites, e.g. the "champion"
should be

$ git show 8aaaee|wc -l

but there are others, like 4k lines changes (the above one was adding
the extra jit_state_t argument to core*.h and fp*h inline functions).

  Nevertheless, if there are chances of these changes go upstream,
please let me know what can be done. But my major goal was to
have it more usable for my own project, e.g. I use gmp/mpfr/mpc
for multiple precision because it would require a lot more of work
to use/extend/update a bignum library I wrote several years ago.
I just did not expect all the issues I found with lightning, otherwise
I would probably write my own (sorry but truth must be told :-)...

  The "spotlights" of i386/x86_64 the code I worked so far
basically does:

o work on x86_64, including calls to, and jit functions, with
  any number of arguments; varargs C functions included,
  e.g. can call printf, but it uses System V abi, would need
  adjustments for Windows at least
o use sse on i386, if available, and use st(0) for return values
  following ABI.
o more optimized code generation, avoiding use of stack
  when it can use the destination register as "temporary"
o work with all register combinations; there were a few
  jit_xyz that would work only with some register

  The "incompatible" changes:

o use of jit_gpr_t and jit_fpr_t, and jit_state_t as described
  in another email; incompatible in the sense that it may
  require like one or 2 lines patches on code that generates
  jit in multiple buffers, and uses more obscure lightning
o Added the undocumented jit_rintr_{f,d}_{i,l} interfaces,
  and changed jit_roundr_{f,d}_{i,l} to round ties away from
  zero, and updated the others to not add/sub one if the
  float argument is NaN,[+-]Inf or just does not fit in an
  integer, that is, it keeps the 0x80000000 or
  0x8000000000000000 result on those cases, what usually
  is handy, for example, in my language I treat the later
  specially, and use it as an overflow flag, and reconstruct
  an operation using mpz_t functions (e.g. mpz_set_d).
o Changed/implemented for i386/x86_64 the undocumented
  jit_abs_{f,d} and jit_sqrt_{f,d} to jit_absr_{f,d} and jit_sqrtr_{f,d},
  that is, added the "r" modifier to follow the pattern of other
o It adds a jit_get_cpu function, that uses gcc's constructor
  attribute, to call a function before main, that basically evaluates
  the "cpuid" opcode with the proper %eax or %rax value. The
  function may be explicitly called also, as it has a static
  variable to prevent multiple calls, so, it can be explicitly
  called if not using gcc; this also adds the undocumented
  bitfield globals "jit_cpu" and "jit_flags" for i386/x86_64;
  but these should only be read/modified for debugging
  purposes. jit_flags actually is of interest, and currently
  has only one field, that tells if it can assume round to
  nearest, or must adjust the fpu rounding mode on
  float to integer conversions.

 My next goals:
o Review the sse code to not generate sse2 instructions
  on sse/mmx only cpus, like is done in fp-x87.h, where
  it has an i386 and an i686 path. There is already some
  new, untested, sse4.1 code generation.
o Possibly somewhat big changes that unfortunately I
  can only test on i386, but, it is really desirable to have
  the low and high word of a multiplication; if it overflows,
  it would already have the values to setup an mpz in
  any language (like mine) that converts overflow of
  integer math into bignums.
o Work on sin/cos/tan/exp/log etc using the x87, even
  for x86_64. Not needing to call the libm function means
  the JIT_FPR(n) registers contents are unlikely to be
  destroyed; issue should be somewhat small, usually
  just ensure the argument is in the 2^-64 to 2^+64
  range as required by the fpu, but ensuring correctness
  for all input may require significant work.
o Work on complex double (and float while at it) using
  sse. For non sse it would require 2 registers, so, for
  the sake of compatibility, probably better to require
  2 sse registers for complex number operations as
o Possibly work on "long long" for 32 bits. Mainly
  because it should be somewhat simple compared
  to other ideas, but mostly because in my language
  I use it as the "generic" untyped integer value (a 64
  bits value), that is converted to an mpz_t on overflow,
  but having the overflow information directly in the jit
  would mean a lot less of costly tests for every integer

  Also, I will probably merge the "work" branch in "master"
of my git-hub fork soon.

> Paolo

[This email is not as long as the patches I have for lightning :-)]

[Almost replied only to Paolo again, gmail "Reply to All" lies...]


reply via email to

[Prev in Thread] Current Thread [Next in Thread]