bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?


From: Greg Wooledge
Subject: Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?
Date: Mon, 21 May 2012 15:27:38 -0400
User-agent: Mutt/1.4.2.3i

On Mon, May 21, 2012 at 12:19:26PM -0700, Linda Walsh wrote:
> Greg Wooledge wrote:
> >For instance, on HP-UX 10.20, in the en_US.iso88591 locale:
> >    A  a  ...  B  b
> >Meanwhile, on Debian 6.0, in the en_US.iso88591 locale:
> >    a A   ...  b B

> So which is correct?

Both.  Locale collating order is determined by the OS.  You cannot
rely on it, unless you set the LC_COLLATE variable to "C" or "POSIX",
in which case you get ASCII behavior (accented letters are not part
of the character set at all).

> Anyone wanting to reference an upper or lower case range
> [a-z] or [A-Z], is gonna hurt from this.

Correct.

imadev:~$ echo Hello World | tr 'A-Z' 'a-z'
hÉMMÓ wÓSMÐ
imadev:~$ echo Hello World | tr '[:upper:]' '[:lower:]'
hello world

You *cannot* use [a-z] or [A-Z] any more, except in the C/POSIX locale.
If you want to match lowercase characters, you should be using [[:lower:]],
and for uppercase characters, [[:upper:]].



reply via email to

[Prev in Thread] Current Thread [Next in Thread]