bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 issue


From: Tim Waugh
Subject: Re: UTF-8 issue
Date: Mon, 6 Dec 2004 16:37:41 +0000
User-agent: Mutt/1.4.1i

On Mon, Dec 06, 2004 at 09:51:54AM -0500, Chet Ramey wrote:

> Mariano Suárez-Alvarez wrote:
> >Hi,
> >
> >someone just made me note the following behavior with respect to UTF-8
> >handling: on a bash command line,
> >
> >        1) type: read A
> >        2) type a ñ character, that is, a U+00F1 LATIN SMALL LETTER N
> >        WITH TILDE character
> >        3) now backspace it away and hit Enter.
> >        4) now say: echo $A | od -x
> >        5) you should see 
> >        
> >                0000000 0ac3
> >                0000002
> >                
> >        although it should be just 0a. (Note UTF-8 for the ñ
> >        character is 0xC3 0xB1, so I'm getting the remnants of the
> >        deleted ñ) 
> >        
> >
> >I don't know if this is due to bash doing something wrong during the
> >read (maybe it does not set up the line discipline correctly?) or
> >something else. So you are my first try at nailing this ;-)
> 
> I am able to reproduce this using a UTF-8 locale, but I'm not sure it's
> bash's problem.  Since this is a buffered read, bash just calls read(2)
> and returns characters one at a time to the read builtin. read(2)
> returns two characters:  the first byte of the multibyte character, and
> newline.

I haven't been able to reproduce this problem at all:

$ read A
�^H
$ echo $A | od -tx1
0000000 c3 b1 08 0a
0000004

$ read -e A
   <-- here I entered the character and pressed backspace once
address@hidden ~]$ echo $A | od -tx1
0000000 0a
0000001

GNU bash, version 3.00.16(1)-release (i386-redhat-linux-gnu)
$ rpm -q bash
bash-3.0-24
$ echo $LANG
en_GB.UTF-8

Tim.
*/

Attachment: pgpIAekXEYa1w.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]