coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] cksum: Use pclmul hardware instruction for CRC32 calculation


From: Kristoffer Brånemyr
Subject: Re: [PATCH] cksum: Use pclmul hardware instruction for CRC32 calculation
Date: Fri, 12 Mar 2021 15:33:00 +0000 (UTC)

Hi,
I was just wondering if you are planning to merge the change, or if you decided 
against it? :)I wanted to use the cpuid.h autoconf detection for another patch 
I'm working on.

-- 
/Kristoffer Brånemyr 

    Den lördag 13 februari 2021 14:06:31 CET, Pádraig Brady <p@draigbrady.com> 
skrev:  
 
 On 13/02/2021 07:38, Kristoffer Brånemyr via GNU coreutils General Discussion 
wrote:
> Hi,
> I implemented another improvement for cksum to increase the speed of it some 
> more. It is possible to use x86 pclmul hardware instruction for CRC32 
> calculation. The patch detects support for this by using CPUID, and falls 
> back to the slice by 8 algorithm if no support. Also added detection in 
> autoconf, so it only will be compiled on supported targets.
> 
> By my testing it seem the checksum calculation is sped up about 6x compared 
> to slice by 8 algorithm (looking at user time). However! Since the time the 
> process spends waiting on syscalls (fread) is still the same, actual real 
> time speedup is only 3x. It would be an interesting exercise to try to use 
> async IO, so you could checksum one block while reading the next. Maybe I 
> will try that one day.
> 
> As a sidenote, x86 also has a crc32 hardware instruction but it uses a 
> different polynominal than cksum does, so not possible to use here.
> 
> Some benchmarking with a file already in file cache.
> Oldest version: (byte by byte)
> ztion@rita:~/coreutils/coreutils-8.32/src$ time ./cksum 
> /disk2/download/bigfile2G
> 
> real    0m7,311s
> user    0m7,039s
> sys    0m0,262s
> 
> Slice by 8 version:
> ztion@rita:~/coreutils/coreutils-8.32/src$ time ./cksum.slice 
> /disk2/download/bigfile2G
> 
> real    0m1,546s
> user    0m1,267s
> sys    0m0,247s
> 
> ztion@rita:~/coreutils/coreutils_fork/src$ time ./cksum 
> /disk2/download/bigfile2G
> 
> real    0m0,462s
> user    0m0,191s
> sys    0m0,271s
> 
> 
> 
> The patch is at:
> https://github.com/coreutils/coreutils/pull/48

Very nice work.

The combination of compile time vs run time checks looks general enough,
and should work for all systems / cross compilation targets on first glance.

The win looks significant enough to warrant the extra complexity.

I'll close the pull request for book keeping reasons,
but it's fine to post the patch there.

thanks!
Pádraig
  

reply via email to

[Prev in Thread] Current Thread [Next in Thread]