Re: wc: expand help of '-L' (and a question)

coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wc: expand help of '-L' (and a question)

From:	Stephane Chazelas
Subject:	Re: wc: expand help of '-L' (and a question)
Date:	Wed, 13 May 2015 13:01:12 +0100
User-agent:	Mutt/1.5.21 (2010-09-15)

2015-05-13 03:00:48 +0100, Pádraig Brady:
[...]
> Yes. You could filter with sed to adjust:
> 
>          sed 's/././g' | wc -L    # count chars
> LC_ALL=C sed 's/././g' | wc -L    # count bytes
[...]

Note that unicode code points D800 to DFFF (reserved for UTF-16
encoding) and 110000 to 7FFFFFFF now that they've given up on
ever having anything above 10FFFF) are not characters.

Still GNU sed considers their UTF-8 encodings (as per the
original UTF-8 encoding, before it got limited to 4 bytes)
as characters.

$ printf '\ud800\udfff\U110000\U7fffffff\n' | sed s/././g | wc -L
4

(I'm not sure I'd object to that though).

Other byte sequences that don't form valid characters are not:

$ printf '\x80\xff' | sed s/././g | wc -L
0

-- 
Stephane

[Prev in Thread]

Current Thread

[Next in Thread]

Re: wc: expand help of '-L' (and a question), Pádraig Brady, 2015/05/12
- Re: wc: expand help of '-L' (and a question), Stephane Chazelas <=

Prev by Date: Re: Is `ls` exactly the same as `dir`?
Next by Date: Re: RFC: timeout --foreground should not send SIGCONT
Previous by thread: Re: wc: expand help of '-L' (and a question)
Next by thread: Is `ls` exactly the same as `dir`?
Index(es):
- Date
- Thread