[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: feature request for coreutils: b2sum
From: |
Pádraig Brady |
Subject: |
Re: feature request for coreutils: b2sum |
Date: |
Mon, 08 Jun 2015 22:56:25 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 |
On 08/06/15 22:17, Taylor R Campbell wrote:
> Date: Mon, 08 Jun 2015 21:24:30 +0100
> From: Padraig Brady <address@hidden>
>
> On 08/06/15 21:08, Taylor R Campbell wrote:
> > Zooko asked me to send the following timings of portable BLAKE2 C code
> > versus the hand-optimized assembly for MD5 and portable C for SHA-256
> > that one finds in OpenSSL 1.0.1k, computed on a 1.2 GHz Freescale
> > i.MX6 CPU (on different file, from /dev/urandom, of the same size as
> > Zooko reported timings for, 1073741824 bytes):
>
> Questions...
>
> You probably shouldn't read too much into this crude measurement.
>
> Here is a much more precise performance comparison, closer to what you
> will find in SUPERCOP (<http://bench.cr.yp.to/>, which is where you
> should look for high-quality performance comparisons of crypto
> algorithms):
>
> http://mumble.net/~campbell/tmp/blake2.imx6
>
> The first number on each line is the size of the message in bytes.
> The remaining numbers are nanoseconds per byte, measured by
> clock_gettime(CLOCK_MONOTONIC) before and after computing the hash,
> averaged over 16 trials. The +1 means the input buffer was unaligned.
>
> The BLAKE2 code, and timing code, for those data are at
>
> http://mumble.net/~campbell/hg/blake2
>
> with the MD5 and SHA-256 timing code adapted slightly to use OpenSSL's
> API instead of the BSD libc API for MD5 and SHA-256.
>
> (Yes, that code should use the ARM cycle counter instead of
> clock_gettime(CLOCK_MONOTONIC). Patches welcome!)
>
> The rest of this message is about the less precise measurements of the
> code at <https://blake2.net/> previously under discussion here.
>
> Does the file fit in cache?
>
> Yes. The machine has 4 GB of RAM.
>
> A file about quarter the size would be enough for this test I think.
>
> Yes. I used 1073741824 bytes because that is what zooko had used.
>
> The md5sum, sha256sum, and sha512sum below were from coreutils
> ./configured --with-openssl=yes ?
>
> On second thought, I'm not sure: md5sum and sha256sum are not linked
> against libcrypto, so perhaps not. It was from the Debian jessie
> coreutils 8.23-4 package for armhf.
Right it's not enabled by default there.
> On the other hand, I get about the same timings from `openssl md5' and
> `openssl sha256', so perhaps md5sum and sha256sum were just statically
> linked against OpenSSL.
Probably just the arm specific code (if any) is not significantly
different than straight C. Note on x86_64 the biggest difference (40%)
was with sha1sum anyway.
>
> > $ time md5sum randfile.0
> > 7af160fa500c6ad20be1c8119c9141f8 randfile.0
> >
> > real 0m9.132s
> > user 0m6.600s
> > sys 0m2.530s
>
> I presume this was precached?
>
> Yes. I warmed the cache by running each program twice first.
>
> > $ time b2sum randfile.0
> >
> ea2c77e755d0f5c84e9fff444cd6ce83a566b134d43e4fe37ed53886e0ca5c7e6141968498d5d765c4190e4b567c437337e8e57ef5ba9306cc11db29a4b9e987
> randfile.0
> >
> > real 0m48.012s
> > user 0m46.070s
> > sys 0m1.900s
>
> I presume the above was for sha512sum
>
> This was BLAKE2b, i.e. the 512-bit BLAKE2 hash function, which is the
> default for b2sum. I copied zooko's invocations verbatim.
>
> > $ time b2sum -a blake2sp randfile.0
> > 2886c0adfd613381d02f18a8ed18527c98d88b115a974e61e030fb914118bd0d
> randfile.0
> >
> > real 0m9.880s
> > user 0m23.610s
> > sys 0m3.260s
>
> So this b2sum implementation is multithreaded
> and has about the same total computational cost as sha256sum?
>
> It appears to be multithreaded with OpenMP. I'm using more or less
> the same BLAKE2 code that zooko reported from <https://blake2.net/>,
> specifically blake2_code_20150529.zip.
thanks,
Pádraig.