[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: performance bug of `wc -m`
From: |
Pádraig Brady |
Subject: |
Re: performance bug of `wc -m` |
Date: |
Fri, 18 May 2018 18:45:46 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 |
On 18/05/18 14:06, Eric Fischer wrote:
> For whatever it's worth, the system wcwidth seems to be much faster on my
> MacOS X system (10.11.6) than the replacement wcwidth. Using the same
> benchmark as above, it takes about 0.9 seconds with the replacement wcwidth:
>
> $ yes áááááááááááááááááááá | head -n100000 > mbc.txt
> $ yes 12345678901234567890 | head -n100000 > num.txt
>
> $ time src/wc -Lm < mbc.txt
> 2100000 20
> real0m1.004s
>
> $ time src/wc -m < mbc.txt
> 2100000
> real0m0.909s
>
> $ time src/wc -Lm < num.txt
> 2100000 20
> real0m0.903s
>
> $ time src/wc -m < num.txt
> 2100000
> real0m0.887s
>
> and about 0.03 or 0.09 seconds with the system wcwidth (tested by adding
> return wcwidth (wc); to the top of the lib/wcwidth.c replacement):
>
> $ time src/wc -Lm < mbc.txt
> 2100000 20
> real0m0.098s
>
> $ time src/wc -m < mbc.txt
> 2100000
> real0m0.088s
>
> $ time src/wc -Lm < num.txt
> 2100000 20
> real0m0.038s
>
> $ time src/wc -m < num.txt
> 2100000
> real0m0.032s
>
> Unfortunately the replacement wcwidth is probably necessary for correct text
> measuring. The original MacOS X 10.3 bug where COMBINING ACUTE ACCENT
> reported a width of 1 instead of 0 appears to be fixed, but two other bugs
> that the m4/wcwidth.m4 test looks for (HEBREW POINT SHEVA and ZERO WIDTH
> SPACE reporting widths of 1 instead of 0) appear to still be current.
Interesting.
On the off chance it might have been clang I checked with:
gl_cv_func_wcwidth_works=no CC=clang ./configure --quiet
and still got fast results with uc_width() on glibc.
Now the gnulib replacement is only table lookup and some bit manipulation.
Ah it also calls locale_charset()!
That must be slow on OSX. Indeed :(
https://lists.gnu.org/archive/html/bug-gnulib/2015-01/msg00040.html
https://lists.gnu.org/archive/html/bug-gnulib/2015-02/msg00000.html
I see some recent improvement (which the latest coreutils git should reference):
https://lists.gnu.org/archive/html/bug-gnulib/2018-04/msg00057.html
It still would be nice to get appropriate caching here.
cheers,
Pádraig
- Re: performance bug of `wc -m`, (continued)
- Re: performance bug of `wc -m`, Philip Rowlands, 2018/05/13
- Re: performance bug of `wc -m`, Assaf Gordon, 2018/05/14
- Re: performance bug of `wc -m`, Eric Fischer, 2018/05/16
- Re: performance bug of `wc -m`, Eric Fischer, 2018/05/16
- Re: performance bug of `wc -m`, Pádraig Brady, 2018/05/18
- Re: performance bug of `wc -m`, Pádraig Brady, 2018/05/18
- Re: performance bug of `wc -m`, Bernhard Voelker, 2018/05/18
- Re: performance bug of `wc -m`, Pádraig Brady, 2018/05/18
- Re: performance bug of `wc -m`, Eric Fischer, 2018/05/18
- Re: performance bug of `wc -m`, Eric Fischer, 2018/05/18
- Re: performance bug of `wc -m`,
Pádraig Brady <=
- Re: performance bug of `wc -m`, L A Walsh, 2018/05/18
- Re: performance bug of `wc -m` on glibc systems, Bruno Haible, 2018/05/20
- Re: performance bug of `wc -m` on glibc systems, Bruno Haible, 2018/05/21
- Re: performance bug of `wc -m` on simulated macOS, Bruno Haible, 2018/05/20
- Re: performance bug of `wc -m` on macOS, Bruno Haible, 2018/05/20
- Re: performance bug of `wc -m` on macOS, Pádraig Brady, 2018/05/20
- Re: performance bug of `wc -m` on macOS, Bruno Haible, 2018/05/21
- Re: performance bug of `wc -m` on macOS, Bruno Haible, 2018/05/21
- speeding up `wc -m`, Bruno Haible, 2018/05/21
- Re: speeding up `wc -m`, Pádraig Brady, 2018/05/21