bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#47702: wc man page: first you are talking about bytes, then you are


From: Pádraig Brady
Subject: bug#47702: wc man page: first you are talking about bytes, then you are talking about characters
Date: Sun, 11 Apr 2021 16:50:35 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:84.0) Gecko/20100101 Thunderbird/84.0

On 11/04/2021 02:42, 積丹尼 Dan Jacobson wrote:
Man wc says

        Print newline, word, and byte counts for each FILE, and a total line if
        more than one FILE is specified.  A word is a non-zero-length  sequence
        of characters delimited by white space.

first you are talking about bytes, then you are talking about
characters.

So for the latter, please say
characters (not bytes)
or
characters (same as bytes)
or just
bytes
Yes, even if explained in the INFO file.

You're right that this is under-specified,
in both the man page and the info file.
The above is really characters (not bytes).
In fact as a GNU extension it's printable characters.
POSIX does not specify this, but one can confirm like:


$ printf '\xc3 \xc3' | LC_ALL=C wc --word --character --byte
      0       3       3
$ printf '\xc3 \xc3' | LC_ALL=C.utf8 wc --word --character --byte
      0       1       3

The info file was really quite under-specified in this regard.
I'll apply the attached to clarify things.
Marking this as done.

thanks!
Pádraig

Attachment: wc-clarify-counts.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]