coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: performance bug of `wc -m`


From: Eric Fischer
Subject: Re: performance bug of `wc -m`
Date: Fri, 18 May 2018 14:06:14 -0700

For whatever it's worth, the system wcwidth seems to be much faster on my
MacOS X system (10.11.6) than the replacement wcwidth. Using the same
benchmark as above, it takes about 0.9 seconds with the replacement wcwidth:

$ yes áááááááááááááááááááá | head -n100000 > mbc.txt
$ yes 12345678901234567890 | head -n100000 > num.txt

$ time src/wc -Lm < mbc.txt
2100000      20
real 0m1.004s

$ time src/wc -m < mbc.txt
2100000
real 0m0.909s

$ time src/wc -Lm < num.txt
2100000      20
real 0m0.903s

$ time src/wc -m < num.txt
2100000
real 0m0.887s

and about 0.03 or 0.09 seconds with the system wcwidth (tested by
adding return wcwidth (wc); to the top of the lib/wcwidth.c replacement):

$ time src/wc -Lm < mbc.txt
2100000      20
real 0m0.098s

$ time src/wc -m < mbc.txt
2100000
real 0m0.088s

$ time src/wc -Lm < num.txt
2100000      20
real 0m0.038s

$ time src/wc -m < num.txt
2100000
real 0m0.032s

Unfortunately the replacement wcwidth is probably necessary for correct
text measuring. The original MacOS X 10.3 bug where COMBINING ACUTE ACCENT
reported a width of 1 instead of 0 appears to be fixed, but two other bugs
that the m4/wcwidth.m4 test looks for (HEBREW POINT SHEVA and ZERO WIDTH
SPACE reporting widths of 1 instead of 0) appear to still be current.

Eric


reply via email to

[Prev in Thread] Current Thread [Next in Thread]