[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: performance bug of `wc -m`
From: |
Kaz Kylheku (Coreutils) |
Subject: |
Re: performance bug of `wc -m` |
Date: |
Thu, 17 May 2018 18:04:07 -0700 |
User-agent: |
Roundcube Webmail/0.9.2 |
On 2018-05-13 15:05, Philip Rowlands wrote:
In the slow case, wc is spending most of its time in iswprint /
wcwidth / iswspace. Perhaps wc could learn a faster method of counting
utf-8 (https://stackoverflow.com/a/7298149); this may be worthwhile as
the trend to utf-8 everywhere marches on.
I can't explain without more digging why Python's string
decode('utf-8') is better optimised for length calculations.
On the surface, it seems easy to explain: the Python program is
just decoding UTF-8 and then taking the length. None of that
requires character classification and determination of display width.
If "wc -m" is doing something with display with, it's very different
from what the Python is doing.
What are the requirements underpinning "wc -m", and how do these
iswprint and iswspace functions fit into it?
POSIX says this: "The -c option stands for "character" count,
even though it counts bytes. This stems from the sometimes erroneous
historical view that bytes and characters are the same size.
Due to international requirements, the -m option (reminiscent of
"multi-byte") was added to obtain actual character counts."
I don't see how this amounts to having to call iswspace and all that.
Nowhere does POSIX say that the display width of a character
has to be obtained in "wc" and I don't see that in the GNU documentation
either.
- Re: performance bug of `wc -m` on glibc systems, (continued)
- Re: performance bug of `wc -m` on glibc systems, Bruno Haible, 2018/05/21
- Re: performance bug of `wc -m` on simulated macOS, Bruno Haible, 2018/05/20
- Re: performance bug of `wc -m` on macOS, Bruno Haible, 2018/05/20
- Re: performance bug of `wc -m` on macOS, Pádraig Brady, 2018/05/20
- Re: performance bug of `wc -m` on macOS, Bruno Haible, 2018/05/21
- Re: performance bug of `wc -m` on macOS, Bruno Haible, 2018/05/21
- speeding up `wc -m`, Bruno Haible, 2018/05/21
- Re: speeding up `wc -m`, Pádraig Brady, 2018/05/21
- Re: performance bug of `wc -m`, Kaz Kylheku (Coreutils), 2018/05/17
- Re: performance bug of `wc -m`, Eric Fischer, 2018/05/17
Re: performance bug of `wc -m`,
Kaz Kylheku (Coreutils) <=
Re: performance bug of `wc -m`, Bruno Haible, 2018/05/20