[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#47702: wc man page: first you are talking about bytes, then you are
From: |
Pádraig Brady |
Subject: |
bug#47702: wc man page: first you are talking about bytes, then you are talking about characters |
Date: |
Sun, 11 Apr 2021 16:50:35 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:84.0) Gecko/20100101 Thunderbird/84.0 |
On 11/04/2021 02:42, 積丹尼 Dan Jacobson wrote:
Man wc says
Print newline, word, and byte counts for each FILE, and a total line if
more than one FILE is specified. A word is a non-zero-length sequence
of characters delimited by white space.
first you are talking about bytes, then you are talking about
characters.
So for the latter, please say
characters (not bytes)
or
characters (same as bytes)
or just
bytes
Yes, even if explained in the INFO file.
You're right that this is under-specified,
in both the man page and the info file.
The above is really characters (not bytes).
In fact as a GNU extension it's printable characters.
POSIX does not specify this, but one can confirm like:
$ printf '\xc3 \xc3' | LC_ALL=C wc --word --character --byte
0 3 3
$ printf '\xc3 \xc3' | LC_ALL=C.utf8 wc --word --character --byte
0 1 3
The info file was really quite under-specified in this regard.
I'll apply the attached to clarify things.
Marking this as done.
thanks!
Pádraig
wc-clarify-counts.patch
Description: Text Data