[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Coreutils-gotchas (was:Re: bug#22045: expr substr ...)
From: |
Assaf Gordon |
Subject: |
Coreutils-gotchas (was:Re: bug#22045: expr substr ...) |
Date: |
Sun, 29 Nov 2015 01:34:10 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 |
On 11/29/2015 12:16 AM, Pádraig Brady wrote:
I must collate some gotchas like this.
Initial list started at:
http://www.pixelbeat.org/docs/coreutils-gotchas.html
Fantastic list!
I would suggest adding four 'wc' entries:
1. "wc -l" on a file with text but no new-line character will return zero.
$ printf "hello world" | wc -l
0
2. "wc -l" on a file in which the last line doesn't end with NL
will return a value of one-less than naively expected:
$ printf "hello\nworld" | wc -l
1
3. "wc -L" counts "screen display width" (while expanding tabs),
not characters.
$ printf "ab\txyz\n" | wc -L
11
$ printf "abc\txyz\n" | wc -L
11
$ printf "abcd\txyz\n" | wc -L
11
4. "wc -L" counts only valid, printable characters, including unicode.
# valid UTF-8 sequence counted as one character:
$ printf "\xe2\x99\xa5" | wc -L
1
# invalid UTF-8 sequence not counted:
$ printf "\xe2\xf2\xa5" | wc -l
0
# unprintable characters (in C locale) are not counted:
$ printf "\xe2\x99\xa5" | LC_ALL=C wc -L
0
# To count bytes, use sed:
$ printf "\xe2\x99\xa5" | LC_ALL=C sed 's/././g' | wc -L
3
These are based on your answer from a while back:
http://lists.gnu.org/archive/html/coreutils/2015-05/msg00013.html
Thanks!
- Assaf
Re: bug#22045: expr substr returns with an error code 1 when the substring starts with 0, Bernhard Voelker, 2015/11/29