qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 17/27] tcg-ppc64: Implement bswap64


From: Alexander Graf
Subject: Re: [Qemu-devel] [PATCH v3 17/27] tcg-ppc64: Implement bswap64
Date: Tue, 02 Apr 2013 17:23:33 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.3) Gecko/20120306 Thunderbird/10.0.3

On 04/02/2013 05:12 PM, Richard Henderson wrote:
On 2013-04-02 07:41, Alexander Graf wrote:
On 2013-04-01 23:34, Alexander Graf wrote:
Is this faster than a load/store with std/ldbrx?

Hmm.  Almost certainly not.  And since we've got stack space
allocated for function calls, we've got scratch space to do it in.

Probably similar for bswap32 too, eh?

Depends - memory load/store doesn't come for free and bswap32 is quite short.


I'll do a tiny bit o benchmarking for power7.

Cool, thanks a bunch :)

Heh. "Almost certainly not" indeed. Unless I've made some silly mistake,
going through memory stalls badly.  No store buffer forwarding on power7?

With the following test case, time reports:

f1        2.967s
f2        8.930s
f3        7.071s
f4        7.166s

And note that f4 is a normal store/load pair, trying to determine what the
store buffer forwarding delay might be.

Yeah, doesn't look like it makes any sense at all to do a load/store cycle then. What a shame :).

Keep in mind that this tests icache hot cycles. However, you might get bad icache penalties due to the long bswap64 sequence. So all the memory latency you see here might also affect the instruction stream when it gets executed. But then again we only care about performance of cache hot sequences in the first place....


Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]