coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Coreutils-gotchas


From: Pádraig Brady
Subject: Re: Coreutils-gotchas
Date: Sun, 29 Nov 2015 11:43:46 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 29/11/15 06:34, Assaf Gordon wrote:

>   3. "wc -L" counts "screen display width" (while expanding tabs),
>      not characters.
> 
>       $ printf "ab\txyz\n" | wc -L
>       11
>       $ printf "abc\txyz\n" | wc -L
>       11
>       $ printf "abcd\txyz\n" | wc -L
>       11
> 
>   4. "wc -L" counts only valid, printable characters, including unicode.
> 
>       # valid UTF-8 sequence counted as one character:
>       $ printf "\xe2\x99\xa5" | wc -L
>       1
> 
>       # invalid UTF-8 sequence not counted:
>       $ printf "\xe2\xf2\xa5" | wc -l
>       0
> 
>       # unprintable characters (in C locale) are not counted:
>       $ printf "\xe2\x99\xa5" | LC_ALL=C wc -L
>       0
> 
>       # To count bytes, use sed:
>       $ printf "\xe2\x99\xa5" | LC_ALL=C sed 's/././g' | wc -L
>       3


Actually you're right we should call some of the above out as examples.
We should also mention that wc doesn't process terminal control chars specially:

$ printf '\x1b[33mf\bred\x1b[m\n' | wc -L
10

cheers,
Pádraig



reply via email to

[Prev in Thread] Current Thread [Next in Thread]