lightning
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] lightning: Only optimize movr for regular registers


From: Paulo César Pereira de Andrade
Subject: Re: [PATCH] lightning: Only optimize movr for regular registers
Date: Fri, 23 Jun 2023 12:52:45 -0300

Em sex., 23 de jun. de 2023 às 06:18, Paul Cercueil
<paul@crapouillou.net> escreveu:
>
> Hi Paulo,
>
> [...]
>
>
> > > One improvement that I think would be great, is an optimization
> > > pass
> > > that would re-organize the emitted opcodes (after jit_emit). For
> > > instance, to move memory loads as early as possible (before the
> > > target
> > > registers are used) to reduce pipeline stalls. In the case of
> > > SuperH,
> > > this would be very benefical as the architecture is superscalar,
> > > but
> > > not out-of-order - so the instructions have to be ordered cleverly
> > > to
> > > unlock the best performance.
> >
> >   I do not know how SuperH works, but one idea could be something
> > like
> > what is done for the ia64 port, that keeps up to 3 pending
> > instructions,
> > then tries to best approach to run the instructions in a cycle. There
> > are two bitmaps, one for registers and one for predicates. Probably
> > can do something similar, but with more than 3 instructions, and
> > flushing when encountering a conflict, label or branch. Anything more
> > complex would  require another intermediate representation and could
> > end up being too costly.
>
> Thanks, that's helpful.

  Maybe some scheduling logic could be added. If I understand correctly,
it could be mostly done heuristically in the top level, by allocating a temp
register for temporaries, and move early the immediate to this register,
and use only the jit_coder* instructions, not the jit_codei* ones.

> Another thing that I think could be better in Lightning, is forward
> branches. At the moment where the branch is generated, you do not know
> yet the exact target, so you have to emit opcodes that will support
> very far targets, which can be costly on some archs.
>
> I think it would be possible to compute the maximum distance to the
> target, using the maximum size of each Lightning instruction between
> the branch and the target. That's something that could be done arch-
> independently. Then the code emitters for the branches could use this
> information to deduce which opcode is best to use.

  There is no clear way to do it. There are the internal calli_p, jmpi_p,
but most ports use an internal different way to handle it. As long as
not doing:

inst = jit_movi(reg, 0);
jmpr(reg);
...
jit_patch_at(inst, target);

but just:

inst = jit_jmpi();
...
jit_patch_at(inst, target);

most ports already optimize to check the jit buffer size,  and
attempt a shorter instruction. Since it does not know the forward
branch, it just checks if it fits as a jump to the last byte in the buffer.
It is possible to follow the list attempting to guess the size, but the
logic already over-estimates code size generation, and it could
end up being too slow for code with too many branches.

  See the 'case jit_code_calli:' and 'case jit_code_jmpi:' code in
most jit_${arch}.c. The pattern is:

word = _jit->code.length - (_jit->pc.uc - _jit->code.ptr);
if (some_test_to_validate_reachable_short_jump(word)) ...

> Cheers,
> -Paul

Thanks,
Paulo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]