[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: wc: expand help of '-L' (and a question)
From: |
Pádraig Brady |
Subject: |
Re: wc: expand help of '-L' (and a question) |
Date: |
Wed, 13 May 2015 03:00:48 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 |
On 25/04/15 03:38, Assaf Gordon wrote:
> Hello,
>
> Would you be willing to add the following patch, mentioning tab-expansion and
> multibyte counting of '-L'
> in the "--help" screen, and the manual?
> Currently this is mentioned only in one sentence at the end of a long
> paragraph, and is easily missed.
> My wording could be improved, but I hope this will help prevent confusion
> with 'wc -L' output.
Wow that is confusing/ambiguous.
I'll apply the attached in your name.
>
> Somewhat related:
> I seem to get unexpected result with '-L' when forcing C locale.
> Perhaps I'm doing something wrong, or there's more intricate details of '-L' ?
>
> # This is a Unicode Character 'BLACK HEART SUIT' (U+2665)
> $ printf "\xe2\x99\xa5\n"
>
> # counting characters with UTF-8 locale is 1,
> # Counting bytes is 3,
> # longest line is 1 - as expected:
> $ printf "\xe2\x99\xa5" | LC_ALL=en_US.UTF-8 wc -cmL
> 1 3 1
>
>
> # using C locale, characters=bytes=3,
> # but longest line is 0 ?
> $ printf "\xe2\x99\xa5" | LC_ALL=C wc -cmL
> 3 3 0
>
> This could be because of wc.c line 492, where "isprint" is called on each
> byte (e.g. isprint('\xe2') is false),
> and so these characters are not counted at all?
Yes. You could filter with sed to adjust:
sed 's/././g' | wc -L # count chars
LC_ALL=C sed 's/././g' | wc -L # count bytes
cheers,
Pádraig.
wc-L-clarify.patch
Description: Text Data
- Re: wc: expand help of '-L' (and a question),
Pádraig Brady <=