> On 15/03/15 21:14, Kristoffer Brånemyr wrote:
>>
>>
>>
>>
>>>Den söndag, 15 mars 2015 20:13 skrev Pádraig Brady <
address@hidden>:
>>>
>>>
>>>>On 15/03/15 08:33, Kristoffer Brånemyr wrote:
>>>>
>>>> Hi,
>>>>
>>>> I did some tests and found out you can actually beat memchr with a simple loop. Tests were done on >>a Intel Xeon E3-1231v3 (4*3.4GHz), on a 4GB file that was already cached in memory. >>Benchmarking >was done simply with the 'time' command. I don't know how this code would run on >>other >architectures, but I guess you could put it in an #ifdef?
>>>>
>>>> Coreutils 2.83 version, compiled with -O3:
>>>> 507755520 /home/ztion/words
>>>>
>>>> real 0m3.126s
>>>> user 0m2.699s
>>>> sys 0m0.429s
>>>>
>>>>
>>>> Improved version compiled with -O2:
>>>> 507755520 /home/ztion/words
>>>>
>>>> real 0m2.857s
>>>> user 0m2.461s
>>>> sys 0m0.396s
>>>>
>>>> Improved version compiled with -O3:
>>>> 507755520 /home/ztion/words
>>>>
>>>> real 0m1.518s
>>>> user 0m1.157s
>>>> sys 0m0.361s
>>>>
>>>> I studied the generated assembly and with -O3 gcc generates some fancy SSE code, getting some nice speedups. memchr is also SSE optimized as far as I know, so it's interesting that this is so much faster, twice as fast actually.
>>>>
>>>> In case you don't like turning -O3 on for some reason (the default in coreutils is -O2 i think), the best version I could put together for -O2 was this:
>>>>
>>>> Improved version 2, compiled with -O2:
>>>> 507755520 /home/ztion/words
>>>>
>>>> real 0m2.206s
>>>> user 0m1.827s
>>>> sys 0m0.379s
>>
>>
>>>Interesting. Thanks for the results.
>>>I use 'gcc -march=native -g -O3' locally, and with that can't see a difference in performance.
>>>
>>>What version of glibc and gcc are you using?
>>>gcc-4.9.2-1.fc21.x86_64 and glibc-2.20-7.fc21.x86_64 here.
>>>
>>>thanks,
>>>Pádraig.
>>
>>
>> Hi,
>>
>> This is with gcc 4.9.2-7 and glibc 2.19-17 on Debian amd64. The difference is still there for me when compiling with your CFLAGS. Have they improved memchr in glibc 2.20? I don't think they have that yet in debian unfortunately.
>>
>> What cpu do you have?
>
>
> i3-2310M
>
> I was doing a very quick test with _short_ lines
> Specifically /usr/share/dict/words
>
> Note GCC should be using builtin_memchr here so not
> hitting the function call overhead.
>
> I'll look in more detail later.
I'm using up to date testing archlinux.
Reference. My test input had following data: