[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 issue

From: Chet Ramey
Subject: Re: UTF-8 issue
Date: Mon, 06 Dec 2004 09:51:54 -0500
User-agent: Mozilla Thunderbird 0.9 (Macintosh/20041103)

Mariano Suárez-Alvarez wrote:

someone just made me note the following behavior with respect to UTF-8
handling: on a bash command line,

        1) type: read A
        2) type a ñ character, that is, a U+00F1 LATIN SMALL LETTER N
        WITH TILDE character
        3) now backspace it away and hit Enter.
        4) now say: echo $A | od -x
5) you should see 0000000 0ac3
although it should be just 0a. (Note UTF-8 for the ñ
        character is 0xC3 0xB1, so I'm getting the remnants of the
deleted ñ)
I don't know if this is due to bash doing something wrong during the
read (maybe it does not set up the line discipline correctly?) or
something else. So you are my first try at nailing this ;-)

I am able to reproduce this using a UTF-8 locale, but I'm not sure it's
bash's problem.  Since this is a buffered read, bash just calls read(2)
and returns characters one at a time to the read builtin. read(2)
returns two characters:  the first byte of the multibyte character, and

``The lyf so short, the craft so long to lerne.'' - Chaucer
( ``Discere est Dolere'' -- chet )
Chet Ramey, ITS, CWRU address@hidden http://tiswww.tis.cwru.edu/~chet/

reply via email to

[Prev in Thread] Current Thread [Next in Thread]