lightning
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 0/9] mips: Fill delay slots v2


From: Paulo César Pereira de Andrade
Subject: Re: [PATCH v2 0/9] mips: Fill delay slots v2
Date: Sat, 14 Jan 2023 06:45:25 -0300

Em sex., 13 de jan. de 2023 às 13:00, Paulo César Pereira de Andrade
<paulo.cesar.pereira.de.andrade@gmail.com> escreveu:
>
> Em qua., 11 de jan. de 2023 às 10:04, Paul Cercueil
> <paul@crapouillou.net> escreveu:
> >
> > Hi Paulo,
>
>   Hi Paul,
>
>   Sorry for the delay in the response.
>
> > Le mardi 10 janvier 2023 à 13:30 -0300, Paulo César Pereira de Andrade
> > a écrit :
> > > Em seg., 9 de jan. de 2023 às 20:04, Paul Cercueil
> > > <paul@crapouillou.net> escreveu:
> > > >
> > > > Hi Paulo,
> > >
> > >   Hi Paul,
> > >
> > > > I switched to a mips64 toolchain and I could reproduce the build
> > > > issues.
> > > > Here's a revised patchset with most of them addressed.
> > > >
> > > > There was a combination of problems: the first one being that my
> > > > opcode_writes_register() was incomplete (missed all of the MIPS64
> > > > opcodes and some MIPS32 opcodes as well).
> > > >
> > > > The second problem was that my "no_flag" was incorrectly set after
> > > > a
> > > > jit_code_note, jit_code_name or jit_code_prolog.
> > > >
> > > > With this patchset applied I now have all the checks passing
> > > > successfully, except for the "clobber" one. I am not sure what's
> > > > going
> > > > on with this one, and it's hard to debug - the disassembly results
> > > > in a
> > > > 30 MiB file, with too many differences to "diff" - because filling
> > > > the
> > > > delay slots causes the code to be smaller.
> > >
> > >   I suggest attempting to bisect the problem by editing clobber.tst
> >
> > I found the problem:
>
>   Your patch moves the instructions in a way the code that keeps track
> of live registers does not understand, as it moves it out of a known live
> range of a register.
>
>   Here the 'at' register was chosen with 'jit_get_reg(jit_class_gpr)'
>
> >  ori    at,at,0x4
> >  dsll   at,at,0x10
> >  ori    at,at,0x1370
> > -lb     v0,0(at)
>
>   And after the 'lb' the 'at' register was released with jit_unget_reg();
>
>   at is again chosen with 'jit_get_reg(jit_class_gpr|jit_class_nospill)'
> to be used for the comparison.
>
> >  lui    at,0x1
> >  ori    at,at,0x1
> >  beq    v1,at
>
> and here again, jit_unget_reg() is used.
>
> > -nop
> > +lb     v0,0(at)
>
>   The 'at' register is not considered live due to the instruction reorder.
>
> > Here the code sees that the LB only writes v0, which is not read by the
> > BEQ, so the LB can be moved to the delay slot. However the 'at'
> > register is still considered dead after the removed LB, while it should
> > actually be marked live, which means jit_beqi() wouldn't use 'at' as
> > its temporary.
>
>   It would be far more complex if multiple instructions were in the
> range of the reorder. Since the patch only reorders the instruction
> before the branch and the delay slot after the branch, it should be
> enough to 'infer' registers live in the instruction being moved.
>
>   By 'infer' live state, I mean, read the input registers of the instruction
> before the branch by decoding the instruction and getting the values
> of the register(s) if any. Note that it needs the 'symbolic' value, and
> extra code (a vector) to have a _Rx_REGNO to _Rx mapping is required.
>
>   The problem is the code in the range from
>
>  reg = jit_get_reg(jit_class_gpr|jit_class_nospill) to jit_unget_reg(reg).
>
>   Knowing the live registers in the instruction to be reordered, for every
> one of them, execute:
>
> /* force registers to be considered live if any */
> rx = jit_get_reg(rx|jit_class_gpr|jit_class_named|jit_class_nospill);
> ...
>
> /* Now get a free temporary for the branch */
> reg = jit_get_reg((jit_class_gpr|jit_class_nospill|jit_class_chk);
>
>   and use the 'reg' value for the 'movi' as already implemented.
>
>   Note the use of jit_class_chk. If there is no register free without
> a spill/reload (JIT_NOREG is returned), should give up in the instruction
> reorder.
>
>   Now, after 'reg' is released:
>
> jit_unget_reg(reg);
>
> for every register that must be considered live, if any, run:
> jit_unget_reg(rx);
>
> > I'm not sure what's the best way to fix that, I could add a
> > 'op_reads_register' function and mark all input registers of the opcode
> > as live at the point where it's removed, but maybe there is a better
> > solution? I'm already under the impression that I'm reinventing the
> > wheel with op_writes_register().
>
>   As long as it is only for the reorder of the delay slot explicitly added
> after the branch with the instruction before the branch, the logic above
> would be enough.
>
>   There is no easier solution because at the start of the function
> implementing the branch, the bitmaps of registers state in the
> previous Lightning to machine code translantion instruction have
> been already updated, and backtracking would be even more
> complicated.

  Actually, the problem should be very easy to fix. Only _AT should
be used as a temporary in the conditions exercised by the patch.
  Checking if _AT is an input register for the instruction before the
branch should be enough.
  If it is used, can use a variant of lib/jit_s390.c:
static jit_int32_t
_jit_get_reg_but_zero(jit_state_t *_jit, jit_int32_t flags)
{
    jit_int32_t        reg;
    reg = jit_get_reg(jit_class_gpr);
    if (reg == _R0) {
    reg = jit_get_reg(jit_class_gpr|flags);
    jit_unget_reg(_R0);
    }
    return (reg);
}
but for _AT. If it returns JIT_NOREG, just fallback and do not
reorder.

  Also, the previous example:

/* force registers to be considered live if any */
rx = jit_get_reg(rx|jit_class_gpr|jit_class_named|jit_class_nospill);

is wrong. Should also use jit_class_chk. If it returns JIT_NOREG
it means the register is not a candidate to be allocated. Due to either:

o register is known to be live and nospill was used
o register is spilled
o register is an argument to the current lightning instruction

and then, only call jit_unget_reg if JIT_NOREG was not returned.

  Your patch can also be used for floating point branches. In this
case should extend op_writes_register() for float operations.

> > Cheers,
> > -Paul
> >
> >
> > >
> > >   You can also edit the macros to add an ending new line, and run
> > > something like:
> > >
> > > $ gcc -E -x c -D__WORDSIZE=64 -D__LITTLE_ENDIAN -D__mips__=1
> > > check/clobber.tst > check/clobber.i
> > >
> > > then, run:
> > >
> > > $ cd check
> > > $ make debug
> > > ...
> > > (gdb) b _jit_emit
> > > (gdb) run -v clobber.i >& clobber.txt
> > > ...
> > > (gdb) finish
> > >
> > > assuming you have a build with --enable-devel-disassembler, then, you
> > > can also add breakpoints in addresses, for example.:
> > >
> > > (gdb) b *0xdeadbeef
> > >
> > > and do single step to debug states. Could also just add a breakpoint
> > > in
> > > 'abort' and check from the assembly output the problem (might have
> > > registers overwritten at abort entry).
> > >
> > >   The clobber test case must pass.
> > >
> > >   It validates that JIT_Rx, JIT_Vx and JIT_Fx not used in the
> > > operation
> > > are not clobbered. That is, there are no unexpected side effects.
> > >   Actually, clobber.tst should be extended as there are new codes
> > > since
> > > it was implemented. It also only tests the minimal set of registers,
> > > 3 JIT_Rx,
> > > 3 JIT_Vx and 3 JIT_Fx.
> > >
> > > > Cheers,
> > > > -Paul
> > > >
> > > > Paul Cercueil (8):
> > > >   mips: Optimize jit_eqr / jit_eqi
> > > >   mips: Fill delay slots of JR opcodes in jit_jmpr
> > > >   mips: Fill delay slots of JALR opcodes in jit_callr
> > > >   mips: Fill delay slots of J in jit_jmpi
> > > >   mips: Fill delay slots in jit_beqr / jit_beqi
> > > >   mips: Fill delay slots in jit_bner / jit_bnei
> > > >   mips: Fill delay slots in jit_bgtr, jit_bgti, jit_bler, jit_blei
> > > >   mips: Fill delay slots in jit_bger, jit_bgei, jit_bltr, jit_blti
> > > >
> > > >  lib/jit_mips-cpu.c | 731 +++++++++++++++++++++++++----------------
> > > > ----
> > > >  lib/jit_mips.c     |  87 ++++--
> > > >  2 files changed, 466 insertions(+), 352 deletions(-)
> > > >
> > > > --
> > > > 2.39.0
> > >
> > > Thanks,
> > > Paulo
>
> Thanks,
> Paulo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]