[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] wc: speed-up by simplifying avx code
From: |
Pádraig Brady |
Subject: |
Re: [PATCH] wc: speed-up by simplifying avx code |
Date: |
Sun, 31 Mar 2024 13:12:31 +0100 |
User-agent: |
Mozilla Thunderbird |
On 31/03/2024 00:18, Evgeny Nizhibitsky wrote:
Here is the proposed patch for both simplifying and consistently speeding up
the avx version of wc -l by 10% in up to 1 billion rows scenarios on 7800X3D
(probably should be tested on different data samples and CPUs).
The patch was mangled, but I manually applied it.
Probably best to attach rather than pasting any further patches.
Attaching here in case others want to try.
This is good as it simplifies the code,
and should have the same portability, to machines and compilers.
I'll adjust the configure.ac check to be more aligned.
As for performance, I tested on my laptop with no change:
# on an i7-5600U with 1 billion short lines
$ yes | head -n1000000000 > /dev/shm/yes
$ time src/wc-old -l /dev/shm/yes
1000000000 /dev/shm/yes
real 0m0.351s
user 0m0.060s
sys 0m0.288s
$ time src/wc-new -l /dev/shm/yes
1000000000 /dev/shm/yes
real 0m0.356s
user 0m0.098s
sys 0m0.255s
Since you change the I/O size from 16 to 256 KiB,
it's more aligned with the recent I/O size adjustment in:
https://github.com/coreutils/coreutils/commit/fcfba90d0
In fact perhaps much of the speedup is just from that change.
Can you test on your system with the buffer reduced back to 16KiB
to see how much that impacts the performance?
thanks,
Pádraig
wc-popcount.patch
Description: Text Data