Re: [PATCH] Improve sha*sum speed

bug-gnulib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Improve sha*sum speed

From:	Pádraig Brady
Subject:	Re: [PATCH] Improve sha*sum speed
Date:	Tue, 13 Sep 2011 13:11:58 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20110707 Thunderbird/5.0

On 09/12/2011 03:49 PM, Loïc Le Loarer wrote:
> Hi,
> 
> Here is my latest results and patch. Please find the patches to
> sha1.c, sha256.c and sh512.c attached and the "time" of the resulting
> binaries in sha_benchs.log. For all binaries, in 64 and 32 bits modes
> (.m32), I run 3 times the command "\time sha*sum zero1G" where zero1G
> is a 10^9 bytes file created by the command:
> dd if=/dev/zero of=zero1G count=1 bs=1 seek=$(( 1000 * 1000 * 1000 - 1 ))

Note using a sparse file should eliminate
some I/O overhead and caching issues.
I'm using: truncate -s1G 1G

> 
> The compilation of coreutils was done using the command
> make CFLAGS="-O3"

I used -O2 -march=corei7-avx

> for 64 bit version and
> make CFLAGS="-m32 -O3"
> for 32 bit version.
> 
> gcc is version 4.4.5 (Ubuntu 10.10)

gcc version 4.6.0 20110603 (Red Hat 4.6.0-10)

> My CPU is a Sandy Bridge @2.5GHz.

Sandy Bridge i3-2310M CPU @ 2.10GHz

> 
> For sha1, the result is very close to Linus' version for git.
> 
> I think it could be a good idea to include thoses patches to improve
> the C versions, it is probably close to the best it can be done in
> "pure" C.
> 
> To improve further, assembly with or without SSE could be done in a second 
> pass.
> 
> What to you think of that ?
> 
> I don't have a GCC farm access yet, so I can only test on my system for now.

Just summarising your results for 1G of data

sha1  \  orig    new
32 bit | 5.15s   2.93s
64 bit | 3.54s   2.59s

I'm not seeing any improvement on my Sandy Bridge system?

sha1  \  orig    new
64 bit | 5.5s   5.5s

Is perhaps the new GCC better able to handle the old code?
Though you said you tried both gcc-4.6.1 and gcc-4.4.5 with
no significant difference (maybe Red Hat have tweaks to their GCC?)

I am seeing a halving of the branch instructions though
which should help a lot for Intel P4 CPUs for example.
(see the attached perf output (obtained using the attached perf-hw script)).
Actually GCC with -O3 rather than -O2 there is the same
halving of branch instructions with either new or old code

I'd like to find out why your Sandy Bridge system
is giving double the performance.

cheers,
Pádraig.

sha1sum.orig.perf
Description: Text document

sha1sum.new.perf
Description: Text document

sha1sum.orig.generic.perf
Description: Text document

perf-hw
Description: Text document

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH] Improve sha*sum speed, Loïc Le Loarer, 2011/09/12
- Re: [PATCH] Improve sha*sum speed, Loïc Le Loarer, 2011/09/12
- Re: [PATCH] Improve sha*sum speed, Pádraig Brady <=
  - Re: [PATCH] Improve sha*sum speed, Loïc Le Loarer, 2011/09/13

Prev by Date: Re: pathmax: support for native Windows
Next by Date: Re: pathmax: support for native Windows
Previous by thread: Re: [PATCH] Improve sha*sum speed
Next by thread: Re: [PATCH] Improve sha*sum speed
Index(es):
- Date
- Thread