[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Potential to accelerate QEMU for specific architectures

From: Paolo Bonzini
Subject: Re: [Qemu-devel] Potential to accelerate QEMU for specific architectures
Date: Mon, 27 May 2013 08:59:50 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130514 Thunderbird/17.0.6

Il 26/05/2013 18:35, Lior Vernia ha scritto:
> What about no to the first bullet but yes to the second (just x86 on
> ARM)? Any room for significant improvement in that case, starting from
> the foundations of QEMU?

You could write a target-specific translator, yes.  But first of all I
would answer whether you're using 32- or 64-bit, and run some profiling
to see what is the hotspot in your case.

I know that in some scenarios helpers for SSE take a considerable amount
of time (5-10%).  You could look at adding SIMD data types to TCG, and
map them to Neon operations or even to fully-unrolled loops.

As other works, ahead-of-time translation can also do a lot more
optimizations, including very aggressive dead-code elimination.  For
example, again considering SSE, something like

     pcmpeqw  %xmm0, %xmm1
     pmovmskb %xmm1, %eax
     test     %eax, %eax
     jz       ...

will be translated to a slow sequence in QEMU due to the expensive
pmovmskb.  A custom code generator can observe that %eax is dead and use
a better translation of this idiom.

Also, floating-point emulation is always done in software in QEMU due to
different representations (and due to the 80-bit floating-point
registers mostly used by 32-bit x86).  This is going to be slow no
matter what.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]