On 2013-04-02 07:41, Alexander Graf wrote:
On 2013-04-01 23:34, Alexander Graf wrote:
Is this faster than a load/store with std/ldbrx?
Hmm. Almost certainly not. And since we've got stack space
allocated for function calls, we've got scratch space to do it in.
Probably similar for bswap32 too, eh?
Depends - memory load/store doesn't come for free and bswap32 is
quite short.
I'll do a tiny bit o benchmarking for power7.
Cool, thanks a bunch :)
Heh. "Almost certainly not" indeed. Unless I've made some silly
mistake,
going through memory stalls badly. No store buffer forwarding on power7?
With the following test case, time reports:
f1 2.967s
f2 8.930s
f3 7.071s
f4 7.166s
And note that f4 is a normal store/load pair, trying to determine what
the
store buffer forwarding delay might be.