coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: performance bug of `wc -m`


From: Eric Fischer
Subject: Re: performance bug of `wc -m`
Date: Thu, 17 May 2018 18:56:15 -0700

On Thu, May 17, 2018 at 6:04 PM, Kaz Kylheku (Coreutils) <
address@hidden> wrote:

What are the requirements underpinning "wc -m", and how do these
> iswprint and iswspace functions fit into it?
>
…

> Nowhere does POSIX say that the display width of a character
> has to be obtained in "wc" and I don't see that in the GNU documentation
> either.


The GNU requirement for calling wcwidth (if -L is specified) is the claim
in coreutils.texi (added in f325d180c, in 2008) that "[t]he line lengths
here are measured in screen columns, according to the current locale and
assuming tab positions in every 8th column." It doesn't know the screen
columns without calling wcwidth to measure the characters.

The POSIX requirement for calling iswspace (if -w is specified) is the
POSIX definition of a word as "non-zero-length string of characters
delimited by white space." It doesn't know whether a multibyte character
counts as white space without calling iswspace.

Neither of them really ought to depend on whether -m is specified, which
(at least in POSIX) only has to do with whether wc is counting characters
instead of bytes.

Eric


reply via email to

[Prev in Thread] Current Thread [Next in Thread]