coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] cksum: Use pclmul hardware instruction for CRC32 calculation


From: Kaz Kylheku (Coreutils)
Subject: Re: [PATCH] cksum: Use pclmul hardware instruction for CRC32 calculation
Date: Fri, 12 Mar 2021 08:18:48 -0800
User-agent: Roundcube Webmail/0.9.2

On 2021-03-12 07:33, Kristoffer Brånemyr via GNU coreutils General Discussion wrote:
Hi,
I was just wondering if you are planning to merge the change, or if
you decided against it? :)I wanted to use the cpuid.h autoconf
detection for another patch I'm working on.

Regarding the comment "Since the time the process spends
waiting on syscalls (fread) is still the same, actual real
time speedup is only 3x. It would be an interesting exercise
to try to use async IO, so you could checksum one block while
reading the next. Maybe I will try that one day."

You never know, but probably not. If the 3x performance was
achieved with with a hot cache, then async I/O probably isn't
going to do anything, since everything is in RAM already.
When the cache is pre-loaded, the I/O syscalls are pure
CPU overhead, since nothing is waiting on any real I/O.

I would try these improvements, in order:

- Don't use stdio fread, which is an extra layer of calls
  and buffering over read. Use read, and play with different
  buffer sizes.

- Use mmap to map the file to memory, and then crc32 that buffer.

In the non-hot-cache case where async I/O might help, you can
likewise get a potential improvement with mmap by using madvise
with MADV_SEQUENTIAL to give it a hint that you're performing
sequential access (which benefits from reading ahead).







reply via email to

[Prev in Thread] Current Thread [Next in Thread]