bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#10880: instead of characters, tr works on bytes


From: Chris Jones
Subject: bug#10880: instead of characters, tr works on bytes
Date: Mon, 27 Feb 2012 00:44:56 -0500
User-agent: Mutt/1.5.18 (2008-05-17)

On Fri, Feb 24, 2012 at 09:29:12AM EST, Marton Kadar wrote:

[..]

> > $ set | grep ^L
> > LANG=hu_HU.UTF-8
> > LC_ALL=hu_HU.UTF-8
> > LINES=73
> > LOGNAME=kadar1marto518
> > 
> > Now let's see the bytestream for the following string
> > (which means flood in Hungarian):
> > 
> > $ echo árvíz | od -c
> > 0000000 303 241   r   v 303 255   z  \n
> > 0000010
> > 
> > Let us try to delete a character and see if it worked:
> > 
> > $ echo árvíz | tr -d á | od -c
> > 0000000   r   v 255   z  \n
> > 0000005

[..]

Try this for size...

$ echo árvíz | od -t x1z -w16 
$ echo árvíz | tr -d é | od -t x1z -w16 

$ echo árvíz | tr -d é > /tmp/u.txt
$ isutf8 /tmp/u.txt

And there is not even an ‘é’ in ‘árvíz’..

CJ

P.S. Though you do have to look for it a bit, the coreutils manual
clearly states that only single-byte encodings are supported: 

http://www.gnu.org/software/coreutils/manual/html_node/tr-invocation.html

-- 
Mooo Canada!!!!






reply via email to

[Prev in Thread] Current Thread [Next in Thread]