coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: feature request for coreutils: b2sum


From: Assaf Gordon
Subject: Re: feature request for coreutils: b2sum
Date: Tue, 1 Nov 2016 11:25:57 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0

On 11/01/2016 10:35 AM, Pádraig Brady wrote:
In your testing below showing that the blake2bp digest is
dependent on OMP_DYNAMIC env variable, that's just wrong,
whether it's a bug in the implementation or assumptions,
it's a bug.


I suspect it's a wrong assumption about the implementation.
The code has:

    #define PARALLELISM_DEGREE 4

    #if defined(_OPENMP)
      #pragma omp parallel shared(S), num_threads(PARALLELISM_DEGREE)
    #else
      for( i = 0; i < PARALLELISM_DEGREE; ++i )
    #endif
      {
    #if defined(_OPENMP)
        size_t      i = omp_get_thread_num();
    #endif

        [[DO SOMETHING WITH i INSIDE THE BLOCK]]
      }


Which seems to imply that 'num_threads(X)' is equivalent to 'for(i=0;<X;++i)' - 
you either get 4 threads or do 4 iterations serially.
But IMHO the above code is incorrect: OpenMP doesn't have to give you 4 threads 
(as evident by OMP_DYNAMIC=TRUE).


The equivalent OpenMP code should be something like this (not tested):

      #pragma omp parallel for shared(S)
      for( i = 0; i < PARALLELISM_DEGREE; ++i )
      {
        [[DO SOMETHING WITH i INSIDE THE BLOCK]]
      }

Which then executes the block correctly, regardless of number of threads (and 
also respects OMP_NUM_THREADS).
However, the current blake2 implementation might need to be adjusted for this 
modification to work efficiently,
and ensure too many threads don't cause problems (if the algorithm can handle 
only 4).


But regardless, I still think we should not include different variants of the 
same algorithm - it will cause too much confusion.

-assaf




reply via email to

[Prev in Thread] Current Thread [Next in Thread]