qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Mips 64 emulation not compiling


From: J. Mayer
Subject: Re: [Qemu-devel] Mips 64 emulation not compiling
Date: Sat, 27 Oct 2007 15:22:04 +0200

On Sat, 2007-10-27 at 16:01 +0300, Blue Swirl wrote:
> On 10/27/07, J. Mayer <address@hidden> wrote:
> > I also got optimized versions of bit population count which could also
> > be shared:
> > static always_inline int ctpop32 (uint32_t val)
> > {
> >     int i;
> >
> >     for (i = 0; val != 0; i++)
> >         val = val ^ (val - 1);
> >
> >     return i;
> > }
> >
> > If you prefer, I can add those shared functions (ctz32, ctz64, cto32,
> > cto64, ctpop32, ctpop64) later, as they do not seem as widely used as
> > clxxx functions.
> 
> This would be interesting for Sparc64. Could you compare your version
> to do_popc() in target-sparc/op_helper.c?

My feeling is:
my implementation does n loops, n being the number of bits set in the
word, then will always be faster than yours when only a few bits are
set.
your implementation could be better because:
- it has a fixed cost
- it does not do any tests / jumps / loops
The drawback of your implementation is that it generates a lot of code,
thus could never be used directly in micro-ops: on my amd64 host, my
implementation compiles in 36 bytes of code and the 64 bits version does
not generate more code than the 32 bits one. Your (64 bits only)
implementation compiles in 217 bytes of code. On a x86, my 32 bits
version is 49 bytes long, the 64 bits one is 79 bits long and yours is
323 bytes long.
But this would never be a problem when called from a helper.

Then, I'm not really sure of what is the best choice to be done here....
We may have to do tests to see which one of the 2 implementations seems
more efficient.

-- 
J. Mayer <address@hidden>
Never organized





reply via email to

[Prev in Thread] Current Thread [Next in Thread]