lightning
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 0/9] mips: Fill delay slots v3


From: Paulo César Pereira de Andrade
Subject: Re: [PATCH v2 0/9] mips: Fill delay slots v3
Date: Sun, 15 Jan 2023 09:54:09 -0300

Em dom., 15 de jan. de 2023 às 08:05, Paul Cercueil
<paul@crapouillou.net> escreveu:
>
> Hi Paulo,

  Hi Paul,

> Le samedi 14 janvier 2023 à 22:42 -0300, Paulo César Pereira de Andrade
> a écrit :
> > Em sáb., 14 de jan. de 2023 às 12:11, Paul Cercueil
> > <paul@crapouillou.net> escreveu:
> > >
> > > Hi Paulo,
> >
> >   Hi Paul,
> >
> >   Patches pushed as all tests pass.
>
> Great! ;)
>
> > > Here's the V3 of my patchset that attempts to fill branch delay
> > > slots on
> > > MIPS.
> > >
> > > All tests do pass now, except the "float.nodata" check, but this
> > > one
> > > fails in master as well.
> >
> >   What environment are you using for testing? It works in the mips
> > environments I am testing.
> >
> >   Please let me know the output of:
> >
> > $ cpp -dM </dev/null
>
> Just a mips64 toolchain built using buildroot.
>
> I believe you meant
> cpp -dM -E - < /dev/null

  It is a very recent gcc, and probably kernel. My guess is that the issue
is caused by some kind of unaligned access. Is it real hardware or some
emulator?

  Can you please run it as:

$ cd check
$ make debug
...
(gdb) r -d float.tst

and see what if it is just somehow creating a bad immediate or if
it is triggering some kind of fault, like misaligned or invalid instruction?

> See attachment.
>
> > > I implemented your suggestion and it worked just fine. There was
> > > one
> > > last problem though, which caused the "clobber" check to fail once
> > > again; code emitters that generated branches to the next
> > > instruction (to conditionally execute some opcodes) caused
> > > problems, as
> > > the next instruction was not detected as a jump target (no
> > > "jit_flag_patch" set). I solved this by just checking the previous
> > > node's code against a table of accepted values.
> >
> >   Not certain if I fully understand this. It also appears to be a
> > somewhat
> > fragile logic.
> >
> >   Everytime the delay slot is used, there should be a comment
> > telling that instruction is in the delay slot, for example:
> >
> > static void
> > _ltr_f(jit_state_t *_jit, jit_int32_t r0, jit_int32_t r1, jit_int32_t
> > r2)
> > {
> >     jit_word_t        w;
> >     C_OLT_S(r1, r2);
> >     w = _jit->pc.w;
> >     BC1T(0);
> >     /* delay slot */
> >     movi(r0, 1);
> >     movi(r0, 0);
> >     patch_at(w, _jit->pc.w);
> > }
>
> There is a comment indeed, but the comment is gone as soon as you
> compile the code.
>
> If you have a 'ltr_f' followed by a 'beqr', the code will try to find a
> delay slot for the 'beqr', and will detect that the 'movi(r0, 0)' (aka.
> the last opcode generated by 'ltr_f') can safely be moved, because
> there is nothing that can tell us that the last movi is a jump target.
>
> So I just blacklist 'ltr_f' as well as others so that the emitted code
> will always be correct.

  I understand now.  The problem I see is that it is not very clear from
the patch, and/or changes elsewhere could cause it to no longer be
required, or have some missing check if new lightning instructions
are added.

  The problem is that can_swap_ds() does not check the pattern of a
jump in 2 (for fpr comparison) or 3 (for compare-and-swap) instructions
before the last machine instruction that it attempts to move.

  Currently there isn't a clear way to pass parameters to jit generation.
  The closest is the jit_cpu_t for arm and x86. Maybe we could add a
jit_cpu_t for mips, and have an option to make the swap with delay
slot optional? This would make it easy to validate that if a bug is found
in the future, it can be easily verified that it is not a side effect of the
changes in these patches. That is, have a defined jit_cpu_t for mips
in include/lightning/jit_mips.h and then in lib/jit_mips.c:jit_get_cpu()
set it to a default value, and use the flag allowing or disallowing the
patch during jit generation.

> > so, you check the previous opcode for float comparison and cas{r,i}.
> > Shouldn't it also check for b{o,x}{add,sub}{r,i} codes? These also
> > end with a delay slot usage, and based on the patches and your
> > description, not checking them might create very difficult to debug
> > bugs in jit doing complex branch logic on carry/overflow.
>
> The b{o,x}{add,sub}{r,i} end up with a delay slot, and that's perfectly
> fine. The algorithm will detect that and won't try to move the opcode
> (that's the has_delay_slot() check in can_swap_ds()).
>
> Cheers,
> -Paul

Thanks,
Paulo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]