bug#20751: wc -m doesn't count UTF-8 characters properly

bug-coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20751: wc -m doesn't count UTF-8 characters properly

From:	Pádraig Brady
Subject:	bug#20751: wc -m doesn't count UTF-8 characters properly
Date:	Sat, 06 Jun 2015 22:43:28 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0

tag 20751 notabug
close 20751
stop

On 06/06/15 19:49, Valdis Vītoliņš wrote:
>>> Version: wc (GNU coreutils) 8.21
>>>
>>> When 'wc -m' is invoked, it should print character count, but it counts
>>> incorrectly UTF-8 encoded characters. Attached files have 3, 4 an 6
>>> bytes in them, but all have only two UTF-8 encoded characters, which you
>>> can see with any modern text editor. 
>>>
>>> wc -c chows correct number of bytes:
>>> wc -c *
>>>  3 3bytes.txt
>>>  4 4bytes.txt
>>>  6 6bytes.txt
>>> 13 total
>>>
>>> But wc -m shows incorrect number of characters:
>>> wc -m *
>>>  3 3bytes.txt
>>>  3 4bytes.txt
>>>  3 6bytes.txt
>>>  9 total
>>>
>>> But should be:
>>> wc -m *
>>>  2 3bytes.txt
>>>  2 4bytes.txt
>>>  2 6bytes.txt
>>>  6 total

I think it's working correctly.
I.E. the \n is included in the count.

thanks,
Pádraig.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#20751: wc -m doesn't count UTF-8 characters properly, Glenn Morris, 2015/06/06
- bug#20751: wc -m doesn't count UTF-8 characters properly, Valdis Vītoliņš, 2015/06/06
  - bug#20751: wc -m doesn't count UTF-8 characters properly, Pádraig Brady <=
    - bug#20751: wc -m doesn't count UTF-8 characters properly, Valdis Vītoliņš, 2015/06/07
  - bug#20751: wc -m doesn't count UTF-8 characters properly, Stephane Chazelas, 2015/06/07

Prev by Date: bug#20733: coreutils build problem
Next by Date: bug#20733: coreutils build problem
Previous by thread: bug#20751: wc -m doesn't count UTF-8 characters properly
Next by thread: bug#20751: wc -m doesn't count UTF-8 characters properly
Index(es):
- Date
- Thread