[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Linus' sha1 is much faster!
From: |
Linus Torvalds |
Subject: |
Re: Linus' sha1 is much faster! |
Date: |
Mon, 17 Aug 2009 09:22:56 -0700 (PDT) |
User-agent: |
Alpine 2.01 (LFD 1184 2008-12-16) |
On Mon, 17 Aug 2009, Steven Noonan wrote:
>
> Interesting. I compared Linus' implementation to the public domain one
> by Steve Reid[1]
You _really_ need to talk about what kind of environment you have.
There are three major issues:
- Netburst vs non-netburst
- 32-bit vs 64-bit
- compiler version
Steve Reid's code looks great, but the way it is coded, gcc makes a mess
of it, which is exactly what my SHA1 tries to avoid.
[ In contrast, gcc does very well on just about _any_ straightforward
unrolled SHA1 C code if the target architecture is something like PPC or
ia64 that has enough registers to keep it all in registers.
I haven't really tested other compilers - a less aggressive compiler
would actually do _better_ on SHA1, because the problem with gcc is that
it turns the whole temporary 16-entry word array into register accesses,
and tries to do register allocation on that _array_.
That is wonderful for the above-mentioned PPC and IA64, but it makes gcc
create totally crazy code when there aren't enough registers, and then
gcc starts spilling randomly (ie it starts spilling a-e etc). This is
why the compiler and version matters so much. ]
> (average of 5 runs)
> Linus' sha1: 283MB/s
> Steve Reid's sha1: 305MB/s
So I get very different results:
# TIME[s] SPEED[MB/s]
Reid 2.742 222.6
linus 1.464 417
this is Intel Nehalem, but compiled for 32-bit mode (which is the more
challenging one because x86-32 only has 7 general-purpose registers), and
with gcc-4.4.0.
Linus
- Re: Linus' sha1 is much faster!, (continued)