[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Improve sha*sum speed

From: Pádraig Brady
Subject: Re: [PATCH] Improve sha*sum speed
Date: Tue, 13 Sep 2011 13:11:58 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20110707 Thunderbird/5.0

On 09/12/2011 03:49 PM, Loïc Le Loarer wrote:
> Hi,
> Here is my latest results and patch. Please find the patches to
> sha1.c, sha256.c and sh512.c attached and the "time" of the resulting
> binaries in sha_benchs.log. For all binaries, in 64 and 32 bits modes
> (.m32), I run 3 times the command "\time sha*sum zero1G" where zero1G
> is a 10^9 bytes file created by the command:
> dd if=/dev/zero of=zero1G count=1 bs=1 seek=$(( 1000 * 1000 * 1000 - 1 ))

Note using a sparse file should eliminate
some I/O overhead and caching issues.
I'm using: truncate -s1G 1G

> The compilation of coreutils was done using the command
> make CFLAGS="-O3"

I used -O2 -march=corei7-avx

> for 64 bit version and
> make CFLAGS="-m32 -O3"
> for 32 bit version.
> gcc is version 4.4.5 (Ubuntu 10.10)

gcc version 4.6.0 20110603 (Red Hat 4.6.0-10)

> My CPU is a Sandy Bridge @2.5GHz.

Sandy Bridge i3-2310M CPU @ 2.10GHz

> For sha1, the result is very close to Linus' version for git.
> I think it could be a good idea to include thoses patches to improve
> the C versions, it is probably close to the best it can be done in
> "pure" C.
> To improve further, assembly with or without SSE could be done in a second 
> pass.
> What to you think of that ?
> I don't have a GCC farm access yet, so I can only test on my system for now.

Just summarising your results for 1G of data

sha1  \  orig    new
32 bit | 5.15s   2.93s
64 bit | 3.54s   2.59s

I'm not seeing any improvement on my Sandy Bridge system?

sha1  \  orig    new
64 bit | 5.5s   5.5s

Is perhaps the new GCC better able to handle the old code?
Though you said you tried both gcc-4.6.1 and gcc-4.4.5 with
no significant difference (maybe Red Hat have tweaks to their GCC?)

I am seeing a halving of the branch instructions though
which should help a lot for Intel P4 CPUs for example.
(see the attached perf output (obtained using the attached perf-hw script)).
Actually GCC with -O3 rather than -O2 there is the same
halving of branch instructions with either new or old code

I'd like to find out why your Sandy Bridge system
is giving double the performance.


Attachment: sha1sum.orig.perf
Description: Text document

Description: Text document

Attachment: sha1sum.orig.generic.perf
Description: Text document

Attachment: perf-hw
Description: Text document

reply via email to

[Prev in Thread] Current Thread [Next in Thread]