qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC 00/11] AREG0 elimination


From: Blue Swirl
Subject: Re: [Qemu-devel] [PATCH RFC 00/11] AREG0 elimination
Date: Sun, 15 May 2011 17:06:43 +0300

On Sun, May 15, 2011 at 5:03 PM, Aurelien Jarno <address@hidden> wrote:
> On Sun, May 15, 2011 at 04:42:05PM +0300, Blue Swirl wrote:
>> On Sun, May 15, 2011 at 4:02 PM, Aurelien Jarno <address@hidden> wrote:
>> > On Sun, May 15, 2011 at 03:37:00PM +0300, Blue Swirl wrote:
>> >> On Sun, May 15, 2011 at 3:14 PM, Laurent Desnogues
>> >> <address@hidden> wrote:
>> >> > On Sun, May 15, 2011 at 1:33 PM, Blue Swirl <address@hidden> wrote:
>> >> > [...]
>> >> >>> x86_64 uses r14 as TCG_AREG0. Despite the instructions being quite
>> >> >>> simple (only 2 movi_i32), the resulting code makes 2 access to env to
>> >> >>> save the two registers. Having to reload the env pointer each time to 
>> >> >>> a
>> >> >>> register would clearly increase the size of this TB.
>> >> >>
>> >> >> I don't think TCG would be that simple, instead the pointer would be
>> >> >> loaded only once in this case.
>> >> >
>> >> > Assuming TCG was able to allocate a register for that,
>> >> > it would be live at most for one TB, so you'd have to
>> >> > load it at least once per TB, and with block chaining
>> >> > that wouldn't be efficient as you'd keep on reloading it.
>> >>
>> >> Yes, but if there are better uses, the register can be flushed. Now
>> >> this is not possible since the register is always unavailable.
>> >>
>> >
>> > What are the better uses, that justify to flush a register that is going
>> > to be used three or four host asm later?
>>
>> It would obviously replace something else determined by TCG.
>
> The register will be free only for a few host instructions. Could you
> please give more concrete example about such a usage?
>
>> > In the current generated code, roughly one every four instruction
>> > reference TCG_AREG0, so this register is really needed very often.
>> >
>> > If you think TCG will be faster by having one more register in between
>> > I suggest you to first optimize tcg_reg_alloc(), which simply spill
>> > a random register, even if they are some allocated register that won't
>> > be used until the end of the TB. You should also should check how often
>> > TCG spills a register (in which case it would have benefit from one more
>> > register). It happens less than 2000 times when booting an emulated mips
>> > system on x86_64, while more than 160000 TB are generated.
>>
>> Right, on a modern CPU with lots of registers, one additional register
>> won't be helpful, but on i386 the situation should be very different,
>> there are very few registers.
>>
>
> On i386, I indeed get a lot more of spilled registers, that is 340000. Still
> that number is not that high, it's less than two times per TB. If we
> consider that these register spills are pure loss (which is not always
> the case, sometime the spilled register is actually never used later, so
> it's just an anticipated save), that's 4 load/store per TB.
>
> It means to compensate, the env register should not be loaded more than
> 4 times in a TB, which looks like quite difficult to achieve given how
> often this register is used.
>
> Please also note that spilling globals currently need access to the env
> pointer, which might not be loaded, so another register spill is need to
> load it. This will make the code a lot more complex than now to avoid a
> deadlock (probably by spilling local temps first).

OK, this doesn't look so attractive after all.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]