Re: [Qemu-devel] [PATCH v2 00/13] tcg/sparc v8plus code generation

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 00/13] tcg/sparc v8plus code generation

From:	Aurelien Jarno
Subject:	Re: [Qemu-devel] [PATCH v2 00/13] tcg/sparc v8plus code generation
Date:	Sat, 18 Jul 2015 23:18:45 +0200
User-agent:	Mutt/1.5.23 (2014-03-12)

On 2015-07-18 08:21, Richard Henderson wrote:
> On 07/17/2015 02:42 PM, Aurelien Jarno wrote:
> >On 2015-07-17 12:23, Aurelien Jarno wrote:
> >>On 2015-07-16 22:29, Richard Henderson wrote:
> >>>On 07/15/2015 09:54 PM, Aurelien Jarno wrote:
> >>>>While I understand why we need the new trunc_shr_i32 opcode for MIPS64
> >>>>(the 32-bit values must be kept sign-extended), I currently fail to
> >>>>see why it is needed for SPARC.
> >>>
> >>>As far as I recall, it improves code for extracting high parts of 64-bit
> >>>quantities.  Without this, we wind up with a 64-bit shift, requiring a
> >>>64-bit temp register, followed by the "real" truncate which can copy the
> >>>data to a 32-bit destination register.
> >>
> >>Ok, I understand the use case now. So it's not for correctness, but
> >>rather to generate more optimized code.
> >
> >OTOH, it means that we always have to go through a 32-bit register first
> >when truncating a 64-bit value.
> >
> >I mean we gain in the following case:
> >   shr_i64 t64, t64, i
> >   trunc_i64_i32 t32, t64
> >   ...
> >
> >But we lose in the following case:
> >   trunc_i64_i32 t32, t64
> >   neg t32, t32
> >   ...
> 
> Why do you beleive we're using an extra temp here?  Certainly you can't "neg
> t32, t64" in any circumstance.

I haven't tried and I am not familiar with the sparc assembly, but I
guess the above code would be translated that way in the with a real
trunc op:

        shr    %g2, 32, %o0
        sub    %g0, %o0, %o1

With a trunc op translated into a move, we can directly get:

        sub    %g2, %g0, %o1


> Anyway, this comes up most often with interfacing with the sparcv8plus
> calling convention, in which 64-bit quantities must be passed in 2
> registers.  Before, we'd emit code like
> 
>       shrx    %g2, 32, %g1
>       mov     %g1, %o0
>       mov     %g2, %o1
> 
> After, we're able to put the shift output directly to %o0.

What is important is to get a more optimized code in general, which is
the case. I believe that given TCG support multiple architectures, it's
difficult to always get the best possible code.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
address@hidden                 http://www.aurel32.net

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH v2 00/13] tcg/sparc v8plus code generation, Aurelien Jarno, 2015/07/15
- Re: [Qemu-devel] [PATCH v2 00/13] tcg/sparc v8plus code generation, Richard Henderson, 2015/07/16
  - Re: [Qemu-devel] [PATCH v2 00/13] tcg/sparc v8plus code generation, Aurelien Jarno, 2015/07/17
    - Re: [Qemu-devel] [PATCH v2 00/13] tcg/sparc v8plus code generation, Aurelien Jarno, 2015/07/17
    - Re: [Qemu-devel] [PATCH v2 00/13] tcg/sparc v8plus code generation, Richard Henderson, 2015/07/18
    - Re: [Qemu-devel] [PATCH v2 00/13] tcg/sparc v8plus code generation, Aurelien Jarno <=

Prev by Date: [Qemu-devel] [PATCH 2/2] AioContext: optimize clearing the EventNotifier
Next by Date: Re: [Qemu-devel] [PATCH for-2.4] tcg/i386: Implement trunc_shr_i32
Previous by thread: Re: [Qemu-devel] [PATCH v2 00/13] tcg/sparc v8plus code generation
Next by thread: [Qemu-devel] [PATCH v6] pci : Add pba_offset PCI quirk for Chelsio T5 devices
Index(es):
- Date
- Thread