qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: x86 TCG helpers clobbered registers


From: Richard Henderson
Subject: Re: x86 TCG helpers clobbered registers
Date: Sat, 5 Dec 2020 06:38:25 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

On 12/4/20 7:34 PM, Stephane Duverger wrote:
>> You can't just inject a call anywhere you like.  If you add it at
>> the IR level, then the rest of the compiler will see it and work
>> properly.  If you add the call in the middle of another operation,
>> the compiler doesn't get to see it and Bad Things Happen.
> 
> I do understand that, and surprisingly isn't it what is done in the
> qemu slow path ? I mean, the call to the helper is not generated at IR
> level but rather injected through a 'jmp' right in the middle of
> currently generated instructions, plus code added at the end of the
> TB.
> 
> What's the difference between the way it is currently done for the
> slow path and something like:
> 
> static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
> { [...]
>     tcg_out_tlb_load(s, addrlo, addrhi, mem_index, opc,
>                      label_ptr, offsetof(CPUTLBEntry, addr_write));
> 
>     /* TLB Hit.  */
>     tcg_out_qemu_st_filter(s, opc, addrlo, addrhi, datalo, datahi);
>     tcg_out_qemu_st_direct(s, datalo, datahi, TCG_REG_L1, -1, 0, 0, opc);

The difference is that the slow path is aware that there are input registers
that are live, containing data (addrlo, addrhi, datalo, datahi), which must be
stored into the arguments for the slow path call.  Those input registers (and
all other call-clobbered registers) are dead *after* the slow path call.

You are injecting your filter call while those input registers are still live.
 They will be next used by the fast-path store.

That is a very significant difference.

>> No, we generate code for a constant esp, as if by gcc's
>> -mno-push-args option. We have reserved TCG_STATIC_CALL_ARGS_SIZE
>> bytes of stack for the arguments (which is actually larger than
>> necessary for any of the tcg targets).
> 
> As this is done only at the TB prologue, do you mean that the TCG will
> never generate an equivalent to a push *followed* by a memory
> store/load ? Our host esp will never point to a last stacked word,
> issued by the translation of a TCG op ?

TCG will never generate a push for an argument register.  The only push outside
of the prologue is to store the return address for a jmp, a "call" returning to
a different address.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]