bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: square bracket vs. curly brace character ranges


From: Linda Walsh
Subject: Re: square bracket vs. curly brace character ranges
Date: Tue, 02 Oct 2012 10:52:38 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.24) Gecko/20100228 Lightning/0.9 Thunderbird/2.0.0.24 Mnenhy/0.7.6.666



Chet Ramey wrote:
http://lists.gnu.org/archive/html/bug-bash/2012-05/msg00086.html
----
The above relies upon a hack to the algorithm -- use *USEFUL* hack
in most cases, but still a hack.

when I type locale I get:
LANG=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=C
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
----
Note...before bash broke UTF-8 compatiblity, I could use
en_US.UTF-8, but now I assert the current need to do the above
is a bug.

I will make no claim about en_US.iso88591 or other locale-specific
charsets.  However, UTF-8 defines collation order the same as ASCII in
the bottom 127 chars.

Bash ignores UTF-8's collation order.  I really do not know if the
odd character collation order is associated with en_US -- but it seems
that collation order of the UTF-8 character set should override the more
general 'en_US'.

For some reason, I am not allowed to use LC_COLLATE=UTF-8:
-bash: warning: setlocale: LC_COLLATE: cannot change locale (UTF-8): No such file or directory

This seems related to the problem -- in that in specifying UTF-8 (
vs. utf8/UTF8), the distinction has been made in perl and other programs
that UTF-8 is the official name -- that comes with an official collation order.

Thus it seems like having LC_COLLATE=UTF-8 generate an error is a booboo 
somewhere
(gnu libs?)...

IF I was in a chinese local and using a chinese local sorting order, I don't 
know if
I would find an option to use ASCII sorting order would be useful.  But I would
find it useful if it respected the UTF-8 collation requirements, as it handles
not only eE, but all the accented forms as well.

So would this be a LibC/icui18n bug?




reply via email to

[Prev in Thread] Current Thread [Next in Thread]