[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Linus' sha1 is much faster!
From: |
Linus Torvalds |
Subject: |
Re: Linus' sha1 is much faster! |
Date: |
Sun, 16 Aug 2009 15:47:08 -0700 (PDT) |
User-agent: |
Alpine 2.01 (LFD 1184 2008-12-16) |
On Mon, 17 Aug 2009, Giuseppe Scrivano wrote:
>
> Thanks for the hint. I tried gcc-4.4 and it produces slower code than
> 4.3 on the gnulib SHA1 implementation and my patch makes it even more!
Check out the asm, see if you can see why. One of the most common problems
with P4's is literally that you end up loading from the same stack slot
that you just stored to (gcc can do some really crazy spills), and that
causes a store buffer hazard replay.
My personal opinion is that Netburst is useless for trying to optimize C
code for. It's just too random.
> I noticed that on my machine your implementation is ~30-40% faster using
> SHA_ROT for rol/ror instructions than inline assembly, at least with the
> test-case Pádraig wrote. Am I the only one reporting it?
I bet it's the same thing. Small perturbations of the source causing small
changes to register allocation and thus spilling, and then Netburst goes
crazy one way or another. It's interestign trying to fix it, and very
frustrating.
My workstation is a Nehalem (but Core 2 will have pretty much the same
behavior), and it doesn't have the crazy netburst behavior. Shorter and
simpler code generally performs better (which is _not_ true on Netburst).
On my machine, for example, forcing gcc to do those rotates on registers
is the difference between ~381MB/s and 415MB/s. And that's mainly because
it makes gcc keep A-E in registers, rather than trying to cache the
array[] references.
Linus
Re: Linus' sha1 is much faster!, Giuseppe Scrivano, 2009/08/16
Re: Linus' sha1 is much faster!, Pádraig Brady, 2009/08/16
Re: Linus' sha1 is much faster!, Giuseppe Scrivano, 2009/08/17