[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: feature request for coreutils: b2sum
From: |
Samuel Neves |
Subject: |
Re: feature request for coreutils: b2sum |
Date: |
Tue, 1 Nov 2016 15:52:12 +0100 |
User-agent: |
|
On 11/01/2016 03:35 PM, Pádraig Brady wrote:
> Yes I didn't like the thread hardcoding at all, and was going to analyze
> before release.
> Zooko/Samuel, is the digest value dependent on number of threads?
> Did parallelism efficiency fall off after 4?
BLAKE2{sp, bp} have respectively 8 and 4 lanes and therefore using more threads
than {8, 4} is not very useful. You _can_ make a tree mode out of BLAKE2 that
scales significantly better than this, but we felt at design time that the
{sp,bp} variants were simple and good enough for regular usage.
Note, also, that you can get significantly improved performance out of
BLAKE2{sp, bp} by appropriate use of vector units. For example, the code at
https://github.com/sneves/blake2-avx2 improves the speed of both BLAKE2bp and
BLAKE2sp around 2x, to ~1.5 cycles per byte using AVX2 and a single thread.
This would be pretty fast and would avoid problems with OpenMP/pthreads/etc.
All that being said, I think if you're going to choose 1 variant of BLAKE to
use, blake2b is a perfectly adequate choice and I do not have anything against
it.