bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24601: UTF-8 locale makes lexicographic sort weird


From: mathew
Subject: bug#24601: UTF-8 locale makes lexicographic sort weird
Date: Mon, 03 Oct 2016 19:54:02 +0000

coreutils-8.25 compiled from source on Fedora 24:

% echo "+00\n-0c\n+02\n-02" | src/sort
+00
-02
+02
-0c

This seems to be due to locale:

% echo "+00\n-0c\n+02\n-02" | LC_ALL=C src/sort
+00
+02
-02
-0c

echo "+00\n-0c\n+02\n-02" | LC_ALL=en_US.UTF-8 src/sort
+00
-02
+02
-0c

Since OS X 10.11 still comes with coreutils 5.93, I tried that:

% echo "+00\n-0c\n+02\n-02" | LC_ALL=en_US.UTF-8 sort
+00
+02
-02
-0c

I've taken a look at the Unicode collation standard, and I can't
immediately see anything that explains the current (8.25) behavior.

I've also played around with <
http://demo.icu-project.org/icu-bin/locexp?_=en_US.UTF-8&d_=en&x=col> and I
can't come up with any set of Unicode collation options that gives the same
results.


mathew


reply via email to

[Prev in Thread] Current Thread [Next in Thread]