bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

sort behavior - Ubuntu problem?


From: Kevin Scannell
Subject: sort behavior - Ubuntu problem?
Date: Tue, 23 Jan 2007 19:50:55 -0600

I suspect that the behavior I describe below is caused by broken
locale definition files, but I wanted to get an expert opinion on this
before I go trying to find who maintains those upstream.

I know about the "sort does not sort" FAQ, and I don't think that I've
fallen into that trap, so please keep reading!

Anyway, here's a sample file, utf-8 encoded text.
http://borel.slu.edu/obair/test.txt

$ uname -a
Linux borel 2.6.17-10-generic #2 SMP Fri Oct 13 18:45:35 UTC 2006 i686 GNU/Linux

$ sort --version
sort (GNU coreutils) 5.96
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software.  You may redistribute copies of it under the terms of
the GNU General Public License <http://www.gnu.org/licenses/gpl.html>.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and Paul Eggert.

$ locale
LANG=
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE="en_US.utf8"
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=en_US.utf8

$ sort test.txt
a
á
áa
aá
az
áz
ázzzza
azzzzá

The acute-a collates after the "a" (correctly) except when there are
additional non-ASCII characters on the same line.   I see this also
with ga_IE.utf8 which is the locale I usually use, and the one I care
about.  This sort order is definitely wrong there.

The thing that leads me to believe that the problem lies with the
locale definition file is that on a different machine, running Gentoo,
same conditions as above, this file sorts as I want it to, in
dictionary order:

$ uname -a
Linux turing 2.6.17-gentoo-r4 #2 SMP Mon Aug 28 12:53:48 CDT 2006
x86_64 AMD Opteron(tm) Processor 246 AuthenticAMD GNU/Linux

$ sort test.txt
a
á
aá
áa
az
áz
azzzzá
ázzzza

Any advice would be appreciated.
Kevin

reply via email to

[Prev in Thread] Current Thread [Next in Thread]