Yes, it's true that simplifying and speeding-up by the bufsize increase are two
different things although the former allowed the latter.
I just landed more tests with hyperfine for various configurations spanning over the current master
version and a new approach with a range of bufsizes from 16 KiB up to 1 MiB, running on 1 billion
yes'es like you did (1by), a generated file for the recent 1 billion row challenge (1brc, with
entries like "<station name>;<temperature:0.2f>") and the first 100 million
rows for both of them (100my and 100mrc, respectively), all in /dev/shm, yet again with 7800X3D:
The reported timings are as follows:
| version | 100my | 100mrc | 1by | 1brc |
| ------- | ------- | ------- | ------- | ------- |
| master | 21.3 ms ± 1.0 ms | 163.1 ms ± 1.5 ms | 197.1 ms ± 3.0 ms | 1.680 s ±
0.010 s |
| 16 KiB | 21.0 ms ± 1.1 ms | 163.7 ms ± 2.1 ms | 194.3 ms ± 2.5 ms | 1.658 s ±
0.015 s |
| 32 KiB | 20.2 ms ± 0.7 ms | 158.9 ms ± 3.0 ms | 194.6 ms ± 6.4 ms | 1.620 s ±
0.023 s |
| 64 KiB | 19.8 ms ± 0.6 ms | 154.0 ms ± 5.3 ms | 187.5 ms ± 7.2 ms | 1.553 s ±
0.013 s |
| 128 KiB | 18.8 ms ± 0.6 ms | 148.9 ms ± 5.4 ms | 178.4 ms ± 1.3 ms | 1.530 s
± 0.013 s |
| 256 KiB | 19.2 ms ± 0.8 ms | 145.8 ms ± 1.5 ms | 176.4 ms ± 1.6 ms | 1.522 s
± 0.017 s |
| 512 KiB | 19.6 ms ± 0.7 ms | 146.4 ms ± 1.0 ms | 183.0 ms ± 5.0 ms | 1.512 s
± 0.014 s |
| 1 MiB | 19.3 ms ± 0.7 ms | 145.7 ms ± 1.8 ms | 188.4 ms ± 6.2 ms | 1.499 s ±
0.012 s |
And the corresponding speed-up values are as follows:
| version | 100my | 100mrc | 1by | 1brc |
| ------- | ------- | ------- | ------- | ------- |
| master | 0% | 0% | 0% | 0% |
| 16 KiB | 1% | 0% | 1% | 1% |
| 32 KiB | 5% | 3% | 1% | 4% |
| 64 KiB | 8% | 6% | 5% | 8% |
| 128 KiB | 13% | 10% | 10% | 10% |
| 256 KiB | 11% | 12% | 12% | 10% |
| 512 KiB | 9% | 11% | 8% | 11% |
| 1 MiB | 10% | 12% | 5% | 12% |
So again in my case the new approach is on par with the old one while the sweet
spot bufsize of 256 KB seems to bring the best value.
Still more testing on different CPUs and sample files should probably be
conducted.