lightning
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU Lightning 2.2.1 release


From: Paulo César Pereira de Andrade
Subject: Re: GNU Lightning 2.2.1 release
Date: Sat, 18 Feb 2023 11:07:36 -0300

Em sáb., 18 de fev. de 2023 às 09:29, Paul Cercueil
<paul@crapouillou.net> escreveu:
>
> Hi Paulo,

  Hi Paul,

> Le vendredi 17 février 2023 à 16:23 -0300, Paulo César Pereira de
> Andrade a écrit :
> > GNU lightning 2.2.1 released!
> >
> > GNU lightning is a library to aid in making portable programs
> > that compile assembly code at run time.
> >
> > Development:
> > http://git.savannah.gnu.org/cgit/lightning.git
> >
> > Download release:
> > ftp://ftp.gnu.org/gnu/lightning/lightning-2.2.1.tar.gz
> >
> >   GNU Lightning 2.2.1 main new features:
> >
> > o Variable stack framesize implemented for aarch64, arm, i686, mips,
> >   riscv, loongarch and x86_64. This means function calls use only
> >   the minimum required stack space for prolog and epilog.
> > o Optimization of prolog and epilog to not create a frame pointer if
> >   not required, and not even save and restore the stack pointer if
> >   not required on a leaf function. These features implemented for the
> >   ports with variable stack framesize.
> > o New clor, czr, ctor and ctzr instructions, that count
> > leading/trailing
> >   zeros/ones. These use hardware implementation when available,
> > otherwise
> >   fallback to a software implementation.
>
> That's great. I actually had an alpha version of a patch that added
> clzr but never finished it.
>
> I think you could add an extra one, clsr, "count leading sign bits".
> The fallback should be very easy:
>
> jit_rshi(rn(tmp), r1, __WORDSIZE - 1);
> jit_xorr(rn(tmp), r1, rn(tmp));
> jit_clzr(r0, rn(tmp));

  Yes. Fallback is simple. If I recall correctly, only arm64 has it in hardware:

https://developer.arm.com/documentation/dui0801/h/A64-General-Instructions/CLS

  I used it in the first version of clor for aarch64 when experimenting with
instruction, but it did require branch, so, changed to just invert bits and
use clz:
https://git.savannah.gnu.org/cgit/lightning.git/commit/?id=561eed91500f2a31ed9d4305c91940e742613ba8

> Maybe adapted to only return the number of sign bits after the MSB to
> match GCC's __builtin_clrsb(), if it makes more sense.
>
> Speaking about fallbacks, the ones in place look very ineffective (e.g.
> the bit-swap to count trailing bits). I'm sure there are better
> algorithms; I'll have a look.

  It is not even in jit_fallback.c. It is a version without lookup tables nor
branches. I think libgcc variants use lookup tables. This is something
to optimize.

   It is also a good extension for extra Lightning instructions. At least
aarch64 and loongarch have a bit swap/invert instruction:
https://developer.arm.com/documentation/dui0801/h/A64-General-Instructions/RBIT
https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#_bitrev_wd

> Also, you added SLL opcodes to "sign extend top 32 bits" on MIPS, but
> you do that if (__WORDSIZE == 32). What "top 32 bits" are we talking
> about there?

  It is a SLL(r0, r1, 0) that is supposed to sign extend the value. I do not
have access to any mips release 6, so did not test the mips6_p() code
variant.

The documentation I did use (MD00087-2B-MIPS64BIS-AFP-6.06.pdf) says:

"""
Format: CLO rd, rs                                 MIPS32
Purpose: Count Leading Ones in Word
To count the number of leading ones in a word.
...
Restrictions:
Pre-Release 6: To be compliant with the MIPS32 and MIPS64
Architecture, software must place the same GPR num-
ber in both the rt and rd fields of the instruction. The operation of
the instruction is UNPREDICTABLE if the rt and
rd fields of the instruction contain different values. Release 6’s new
instruction encoding does not contain an rt field.

If GPR rs does not contain a sign-extended 32-bit value (bits 63..31
equal), then the results of the operation are
UNPREDICTABLE.
"""

  I did Lightning 2.2.1 release to have public several bug fixes, but
I hope to add extra bit manipulation instructions. At least:

o bit invert
o popcount
o bit rotate

  But there are several other that are useful, like ways to create
bit patterns for any kind of masks. These could at least be used
internally to create constants with repeated patterns.

  If you have other suggestions for new instructions, please let me now :)

  One such instruction could be "multiply and add", available in several
cpus.

  On the long term can add int128 and complex float/double. I would
like to have it, but implementing in all ports is not trivial, and would
require the concept of register pairs, currently only barely used for
qdiv/qmul, and only to put the result pair, not as input.

  Maybe could add a way to inject machine code also, just memcpy
a buffer. This could allow to make optimizations where lightning does
not generate good code, just experiment it with an assembler, then,
when happy with the code, inject it in the jit code.

> Cheers,
> -Paul
>
> > o Correct several bugs with jit_arg_register_p and
> > jit_putarg{r,i}{_f,_d}.
> >   These bugs were not noticed earlier due to an incorrect check for
> >   correctness in check/carg.c.
> > o Add rip relative addressing support for x86_64 and shorter signed
> > 64
> >   bit constant load if the constant fits in a signed 32 bit integer.
> >   This significantly reduces code size generation.
> > o Correct bugs in branch generation code for pppc and sparc.
> > o Correct bug in signed 32 bit integer load in ppc 64 bits.
> > o Add short relative unconditional branches and calls to mips,
> > reducing
> >   code size generation.
> > o And several extra minor optimizations.
> >

Thanks,
Paulo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]