[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: feature request for coreutils: b2sum
From: |
Assaf Gordon |
Subject: |
Re: feature request for coreutils: b2sum |
Date: |
Tue, 1 Nov 2016 11:25:57 -0400 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 |
On 11/01/2016 10:35 AM, Pádraig Brady wrote:
In your testing below showing that the blake2bp digest is
dependent on OMP_DYNAMIC env variable, that's just wrong,
whether it's a bug in the implementation or assumptions,
it's a bug.
I suspect it's a wrong assumption about the implementation.
The code has:
#define PARALLELISM_DEGREE 4
#if defined(_OPENMP)
#pragma omp parallel shared(S), num_threads(PARALLELISM_DEGREE)
#else
for( i = 0; i < PARALLELISM_DEGREE; ++i )
#endif
{
#if defined(_OPENMP)
size_t i = omp_get_thread_num();
#endif
[[DO SOMETHING WITH i INSIDE THE BLOCK]]
}
Which seems to imply that 'num_threads(X)' is equivalent to 'for(i=0;<X;++i)' -
you either get 4 threads or do 4 iterations serially.
But IMHO the above code is incorrect: OpenMP doesn't have to give you 4 threads
(as evident by OMP_DYNAMIC=TRUE).
The equivalent OpenMP code should be something like this (not tested):
#pragma omp parallel for shared(S)
for( i = 0; i < PARALLELISM_DEGREE; ++i )
{
[[DO SOMETHING WITH i INSIDE THE BLOCK]]
}
Which then executes the block correctly, regardless of number of threads (and
also respects OMP_NUM_THREADS).
However, the current blake2 implementation might need to be adjusted for this
modification to work efficiently,
and ensure too many threads don't cause problems (if the algorithm can handle
only 4).
But regardless, I still think we should not include different variants of the
same algorithm - it will cause too much confusion.
-assaf