[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] Speedup wc -l
From: |
Pádraig Brady |
Subject: |
Re: [PATCH] Speedup wc -l |
Date: |
Sun, 15 Mar 2015 22:18:50 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 |
On 15/03/15 21:14, Kristoffer Brånemyr wrote:
>
>
>
>
>>Den söndag, 15 mars 2015 20:13 skrev Pádraig Brady <address@hidden>:
>>
>>
>>>On 15/03/15 08:33, Kristoffer Brånemyr wrote:
>>>
>>> Hi,
>>>
>>> I did some tests and found out you can actually beat memchr with a simple
>>> loop. Tests were done on >>a Intel Xeon E3-1231v3 (4*3.4GHz), on a 4GB file
>>> that was already cached in memory. >>Benchmarking >was done simply with the
>>> 'time' command. I don't know how this code would run on >>other
>>> >architectures, but I guess you could put it in an #ifdef?
>>>
>>> Coreutils 2.83 version, compiled with -O3:
>>> 507755520 /home/ztion/words
>>>
>>> real 0m3.126s
>>> user 0m2.699s
>>> sys 0m0.429s
>>>
>>>
>>> Improved version compiled with -O2:
>>> 507755520 /home/ztion/words
>>>
>>> real 0m2.857s
>>> user 0m2.461s
>>> sys 0m0.396s
>>>
>>> Improved version compiled with -O3:
>>> 507755520 /home/ztion/words
>>>
>>> real 0m1.518s
>>> user 0m1.157s
>>> sys 0m0.361s
>>>
>>> I studied the generated assembly and with -O3 gcc generates some fancy SSE
>>> code, getting some nice speedups. memchr is also SSE optimized as far as I
>>> know, so it's interesting that this is so much faster, twice as fast
>>> actually.
>>>
>>> In case you don't like turning -O3 on for some reason (the default in
>>> coreutils is -O2 i think), the best version I could put together for -O2
>>> was this:
>>>
>>> Improved version 2, compiled with -O2:
>>> 507755520 /home/ztion/words
>>>
>>> real 0m2.206s
>>> user 0m1.827s
>>> sys 0m0.379s
>
>
>>Interesting. Thanks for the results.
>>I use 'gcc -march=native -g -O3' locally, and with that can't see a
>>difference in performance.
>>
>>What version of glibc and gcc are you using?
>>gcc-4.9.2-1.fc21.x86_64 and glibc-2.20-7.fc21.x86_64 here.
>>
>>thanks,
>>Pádraig.
>
>
> Hi,
>
> This is with gcc 4.9.2-7 and glibc 2.19-17 on Debian amd64. The difference is
> still there for me when compiling with your CFLAGS. Have they improved memchr
> in glibc 2.20? I don't think they have that yet in debian unfortunately.
>
> What cpu do you have?
i3-2310M
I was doing a very quick test with _short_ lines
Specifically /usr/share/dict/words
Note GCC should be using builtin_memchr here so not
hitting the function call overhead.
I'll look in more detail later.
thanks,
Pádraig.
- [PATCH] Speedup wc -l, Kristoffer Brånemyr, 2015/03/15
- Re: [PATCH] Speedup wc -l, Pádraig Brady, 2015/03/15
- Message not available
- Re: [PATCH] Speedup wc -l, Pádraig Brady, 2015/03/18
- Re: [PATCH] Speedup wc -l, Bernhard Voelker, 2015/03/19
- Re: [PATCH] Speedup wc -l, Pádraig Brady, 2015/03/19
- Re: [PATCH] Speedup wc -l, Bernhard Voelker, 2015/03/19
- Re: [PATCH] Speedup wc -l, Pádraig Brady, 2015/03/19
- Re: [PATCH] Speedup wc -l, Pádraig Brady, 2015/03/23
- Re: [PATCH] Speedup wc -l, Bernhard Voelker, 2015/03/24
- Re: [PATCH] Speedup wc -l, Pádraig Brady, 2015/03/24