[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] cksum: Use pclmul hardware instruction for CRC32 calculation
From: |
Kristoffer Brånemyr |
Subject: |
Re: [PATCH] cksum: Use pclmul hardware instruction for CRC32 calculation |
Date: |
Fri, 12 Mar 2021 15:33:00 +0000 (UTC) |
Hi,
I was just wondering if you are planning to merge the change, or if you decided
against it? :)I wanted to use the cpuid.h autoconf detection for another patch
I'm working on.
--
/Kristoffer Brånemyr
Den lördag 13 februari 2021 14:06:31 CET, Pádraig Brady <p@draigbrady.com>
skrev:
On 13/02/2021 07:38, Kristoffer Brånemyr via GNU coreutils General Discussion
wrote:
> Hi,
> I implemented another improvement for cksum to increase the speed of it some
> more. It is possible to use x86 pclmul hardware instruction for CRC32
> calculation. The patch detects support for this by using CPUID, and falls
> back to the slice by 8 algorithm if no support. Also added detection in
> autoconf, so it only will be compiled on supported targets.
>
> By my testing it seem the checksum calculation is sped up about 6x compared
> to slice by 8 algorithm (looking at user time). However! Since the time the
> process spends waiting on syscalls (fread) is still the same, actual real
> time speedup is only 3x. It would be an interesting exercise to try to use
> async IO, so you could checksum one block while reading the next. Maybe I
> will try that one day.
>
> As a sidenote, x86 also has a crc32 hardware instruction but it uses a
> different polynominal than cksum does, so not possible to use here.
>
> Some benchmarking with a file already in file cache.
> Oldest version: (byte by byte)
> ztion@rita:~/coreutils/coreutils-8.32/src$ time ./cksum
> /disk2/download/bigfile2G
>
> real 0m7,311s
> user 0m7,039s
> sys 0m0,262s
>
> Slice by 8 version:
> ztion@rita:~/coreutils/coreutils-8.32/src$ time ./cksum.slice
> /disk2/download/bigfile2G
>
> real 0m1,546s
> user 0m1,267s
> sys 0m0,247s
>
> ztion@rita:~/coreutils/coreutils_fork/src$ time ./cksum
> /disk2/download/bigfile2G
>
> real 0m0,462s
> user 0m0,191s
> sys 0m0,271s
>
>
>
> The patch is at:
> https://github.com/coreutils/coreutils/pull/48
Very nice work.
The combination of compile time vs run time checks looks general enough,
and should work for all systems / cross compilation targets on first glance.
The win looks significant enough to warrant the extra complexity.
I'll close the pull request for book keeping reasons,
but it's fine to post the patch there.
thanks!
Pádraig