|
From: | Pádraig Brady |
Subject: | Re: [PATCH] Improve sha*sum speed |
Date: | Tue, 13 Sep 2011 13:11:58 +0100 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20110707 Thunderbird/5.0 |
On 09/12/2011 03:49 PM, Loïc Le Loarer wrote: > Hi, > > Here is my latest results and patch. Please find the patches to > sha1.c, sha256.c and sh512.c attached and the "time" of the resulting > binaries in sha_benchs.log. For all binaries, in 64 and 32 bits modes > (.m32), I run 3 times the command "\time sha*sum zero1G" where zero1G > is a 10^9 bytes file created by the command: > dd if=/dev/zero of=zero1G count=1 bs=1 seek=$(( 1000 * 1000 * 1000 - 1 )) Note using a sparse file should eliminate some I/O overhead and caching issues. I'm using: truncate -s1G 1G > > The compilation of coreutils was done using the command > make CFLAGS="-O3" I used -O2 -march=corei7-avx > for 64 bit version and > make CFLAGS="-m32 -O3" > for 32 bit version. > > gcc is version 4.4.5 (Ubuntu 10.10) gcc version 4.6.0 20110603 (Red Hat 4.6.0-10) > My CPU is a Sandy Bridge @2.5GHz. Sandy Bridge i3-2310M CPU @ 2.10GHz > > For sha1, the result is very close to Linus' version for git. > > I think it could be a good idea to include thoses patches to improve > the C versions, it is probably close to the best it can be done in > "pure" C. > > To improve further, assembly with or without SSE could be done in a second > pass. > > What to you think of that ? > > I don't have a GCC farm access yet, so I can only test on my system for now. Just summarising your results for 1G of data sha1 \ orig new 32 bit | 5.15s 2.93s 64 bit | 3.54s 2.59s I'm not seeing any improvement on my Sandy Bridge system? sha1 \ orig new 64 bit | 5.5s 5.5s Is perhaps the new GCC better able to handle the old code? Though you said you tried both gcc-4.6.1 and gcc-4.4.5 with no significant difference (maybe Red Hat have tweaks to their GCC?) I am seeing a halving of the branch instructions though which should help a lot for Intel P4 CPUs for example. (see the attached perf output (obtained using the attached perf-hw script)). Actually GCC with -O3 rather than -O2 there is the same halving of branch instructions with either new or old code I'd like to find out why your Sandy Bridge system is giving double the performance. cheers, Pádraig.
sha1sum.orig.perf
Description: Text document
sha1sum.new.perf
Description: Text document
sha1sum.orig.generic.perf
Description: Text document
perf-hw
Description: Text document
[Prev in Thread] | Current Thread | [Next in Thread] |