[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Coreutils-gotchas
From: |
Pádraig Brady |
Subject: |
Re: Coreutils-gotchas |
Date: |
Sun, 29 Nov 2015 11:43:46 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 |
On 29/11/15 06:34, Assaf Gordon wrote:
> 3. "wc -L" counts "screen display width" (while expanding tabs),
> not characters.
>
> $ printf "ab\txyz\n" | wc -L
> 11
> $ printf "abc\txyz\n" | wc -L
> 11
> $ printf "abcd\txyz\n" | wc -L
> 11
>
> 4. "wc -L" counts only valid, printable characters, including unicode.
>
> # valid UTF-8 sequence counted as one character:
> $ printf "\xe2\x99\xa5" | wc -L
> 1
>
> # invalid UTF-8 sequence not counted:
> $ printf "\xe2\xf2\xa5" | wc -l
> 0
>
> # unprintable characters (in C locale) are not counted:
> $ printf "\xe2\x99\xa5" | LC_ALL=C wc -L
> 0
>
> # To count bytes, use sed:
> $ printf "\xe2\x99\xa5" | LC_ALL=C sed 's/././g' | wc -L
> 3
Actually you're right we should call some of the above out as examples.
We should also mention that wc doesn't process terminal control chars specially:
$ printf '\x1b[33mf\bred\x1b[m\n' | wc -L
10
cheers,
Pádraig
Re: bug#22045: expr substr returns with an error code 1 when the substring starts with 0, Bernhard Voelker, 2015/11/29