[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#11621: questionable locale sorting order (especially as related to c
From: |
Linda Walsh |
Subject: |
bug#11621: questionable locale sorting order (especially as related to char ranges in REs) |
Date: |
Sun, 03 Jun 2012 15:13:19 -0700 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.24) Gecko/20100228 Lightning/0.9 Thunderbird/2.0.0.24 Mnenhy/0.7.6.666 |
Within in the past few years, use of ranges in RE's has become
unreliable due to some locale changes sorting their native character
sets such that a<A<b<B<y<Y<z<Z (vs. 'C' ordering A<B<Y<Z<a<b<y<z).
Additionally many distro's have switched to UTF-8 resulting in
localizations like en_GB.UTF-8, en_US.UTF-8, etc...
There seems to be a problem in when a user has set their system to use
Unicode, it is no longer using the locale specific character set
(iso-8859-x,
or others).
In Unicode, it is recommended that upper case be uniformly sorted
below lower case (section 6.6, http://www.unicode.org/reports/tr10/).
A chart, including accent variations is at
http://unicode.org/charts/case/chart_Latin.htm.
Temporarily ignoring accents, only talking about lower and upper
case letters, you will note that the sorting order of A=41, B=42, C=43,
while the lower case letters from 'a', have weights a=61, b=62, c=63.
This uniformly puts all lower case letters "after" any upper case letters.
Thus -- I am asserting, that any computer using a local for country
preferences, BUT is also using a unicode character set (e.g. UTF-8),
should return sorted results as specified by the character set.
I.e. the utility 'sort' (and any programs that use the collation/sorting
order specified in the core-utils libs) should return A-Z < a-z.
This is currently not the case and is leading to erroneous results
in programs written before locales were considered. The thing is --
in many cases, within some short period of locales being implemented,
many or most distro's also switched to UTF-8.
Unfortunately it's collation order has not been respected.
I would assert this is a serious bug that should be addressed ASAP...
Thanks,
Linda W.
- bug#11621: questionable locale sorting order (especially as related to char ranges in REs),
Linda Walsh <=