lightning
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 0/9] mips: Fill delay slots v2


From: Paulo César Pereira de Andrade
Subject: Re: [PATCH v2 0/9] mips: Fill delay slots v2
Date: Fri, 13 Jan 2023 13:00:08 -0300

Em qua., 11 de jan. de 2023 às 10:04, Paul Cercueil
<paul@crapouillou.net> escreveu:
>
> Hi Paulo,

  Hi Paul,

  Sorry for the delay in the response.

> Le mardi 10 janvier 2023 à 13:30 -0300, Paulo César Pereira de Andrade
> a écrit :
> > Em seg., 9 de jan. de 2023 às 20:04, Paul Cercueil
> > <paul@crapouillou.net> escreveu:
> > >
> > > Hi Paulo,
> >
> >   Hi Paul,
> >
> > > I switched to a mips64 toolchain and I could reproduce the build
> > > issues.
> > > Here's a revised patchset with most of them addressed.
> > >
> > > There was a combination of problems: the first one being that my
> > > opcode_writes_register() was incomplete (missed all of the MIPS64
> > > opcodes and some MIPS32 opcodes as well).
> > >
> > > The second problem was that my "no_flag" was incorrectly set after
> > > a
> > > jit_code_note, jit_code_name or jit_code_prolog.
> > >
> > > With this patchset applied I now have all the checks passing
> > > successfully, except for the "clobber" one. I am not sure what's
> > > going
> > > on with this one, and it's hard to debug - the disassembly results
> > > in a
> > > 30 MiB file, with too many differences to "diff" - because filling
> > > the
> > > delay slots causes the code to be smaller.
> >
> >   I suggest attempting to bisect the problem by editing clobber.tst
>
> I found the problem:

  Your patch moves the instructions in a way the code that keeps track
of live registers does not understand, as it moves it out of a known live
range of a register.

  Here the 'at' register was chosen with 'jit_get_reg(jit_class_gpr)'

>  ori    at,at,0x4
>  dsll   at,at,0x10
>  ori    at,at,0x1370
> -lb     v0,0(at)

  And after the 'lb' the 'at' register was released with jit_unget_reg();

  at is again chosen with 'jit_get_reg(jit_class_gpr|jit_class_nospill)'
to be used for the comparison.

>  lui    at,0x1
>  ori    at,at,0x1
>  beq    v1,at

and here again, jit_unget_reg() is used.

> -nop
> +lb     v0,0(at)

  The 'at' register is not considered live due to the instruction reorder.

> Here the code sees that the LB only writes v0, which is not read by the
> BEQ, so the LB can be moved to the delay slot. However the 'at'
> register is still considered dead after the removed LB, while it should
> actually be marked live, which means jit_beqi() wouldn't use 'at' as
> its temporary.

  It would be far more complex if multiple instructions were in the
range of the reorder. Since the patch only reorders the instruction
before the branch and the delay slot after the branch, it should be
enough to 'infer' registers live in the instruction being moved.

  By 'infer' live state, I mean, read the input registers of the instruction
before the branch by decoding the instruction and getting the values
of the register(s) if any. Note that it needs the 'symbolic' value, and
extra code (a vector) to have a _Rx_REGNO to _Rx mapping is required.

  The problem is the code in the range from

 reg = jit_get_reg(jit_class_gpr|jit_class_nospill) to jit_unget_reg(reg).

  Knowing the live registers in the instruction to be reordered, for every
one of them, execute:

/* force registers to be considered live if any */
rx = jit_get_reg(rx|jit_class_gpr|jit_class_named|jit_class_nospill);
...

/* Now get a free temporary for the branch */
reg = jit_get_reg((jit_class_gpr|jit_class_nospill|jit_class_chk);

  and use the 'reg' value for the 'movi' as already implemented.

  Note the use of jit_class_chk. If there is no register free without
a spill/reload (JIT_NOREG is returned), should give up in the instruction
reorder.

  Now, after 'reg' is released:

jit_unget_reg(reg);

for every register that must be considered live, if any, run:
jit_unget_reg(rx);

> I'm not sure what's the best way to fix that, I could add a
> 'op_reads_register' function and mark all input registers of the opcode
> as live at the point where it's removed, but maybe there is a better
> solution? I'm already under the impression that I'm reinventing the
> wheel with op_writes_register().

  As long as it is only for the reorder of the delay slot explicitly added
after the branch with the instruction before the branch, the logic above
would be enough.

  There is no easier solution because at the start of the function
implementing the branch, the bitmaps of registers state in the
previous Lightning to machine code translantion instruction have
been already updated, and backtracking would be even more
complicated.

> Cheers,
> -Paul
>
>
> >
> >   You can also edit the macros to add an ending new line, and run
> > something like:
> >
> > $ gcc -E -x c -D__WORDSIZE=64 -D__LITTLE_ENDIAN -D__mips__=1
> > check/clobber.tst > check/clobber.i
> >
> > then, run:
> >
> > $ cd check
> > $ make debug
> > ...
> > (gdb) b _jit_emit
> > (gdb) run -v clobber.i >& clobber.txt
> > ...
> > (gdb) finish
> >
> > assuming you have a build with --enable-devel-disassembler, then, you
> > can also add breakpoints in addresses, for example.:
> >
> > (gdb) b *0xdeadbeef
> >
> > and do single step to debug states. Could also just add a breakpoint
> > in
> > 'abort' and check from the assembly output the problem (might have
> > registers overwritten at abort entry).
> >
> >   The clobber test case must pass.
> >
> >   It validates that JIT_Rx, JIT_Vx and JIT_Fx not used in the
> > operation
> > are not clobbered. That is, there are no unexpected side effects.
> >   Actually, clobber.tst should be extended as there are new codes
> > since
> > it was implemented. It also only tests the minimal set of registers,
> > 3 JIT_Rx,
> > 3 JIT_Vx and 3 JIT_Fx.
> >
> > > Cheers,
> > > -Paul
> > >
> > > Paul Cercueil (8):
> > >   mips: Optimize jit_eqr / jit_eqi
> > >   mips: Fill delay slots of JR opcodes in jit_jmpr
> > >   mips: Fill delay slots of JALR opcodes in jit_callr
> > >   mips: Fill delay slots of J in jit_jmpi
> > >   mips: Fill delay slots in jit_beqr / jit_beqi
> > >   mips: Fill delay slots in jit_bner / jit_bnei
> > >   mips: Fill delay slots in jit_bgtr, jit_bgti, jit_bler, jit_blei
> > >   mips: Fill delay slots in jit_bger, jit_bgei, jit_bltr, jit_blti
> > >
> > >  lib/jit_mips-cpu.c | 731 +++++++++++++++++++++++++----------------
> > > ----
> > >  lib/jit_mips.c     |  87 ++++--
> > >  2 files changed, 466 insertions(+), 352 deletions(-)
> > >
> > > --
> > > 2.39.0
> >
> > Thanks,
> > Paulo

Thanks,
Paulo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]