[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#24601: UTF-8 locale makes lexicographic sort weird
From: |
mathew |
Subject: |
bug#24601: UTF-8 locale makes lexicographic sort weird |
Date: |
Mon, 03 Oct 2016 19:54:02 +0000 |
coreutils-8.25 compiled from source on Fedora 24:
% echo "+00\n-0c\n+02\n-02" | src/sort
+00
-02
+02
-0c
This seems to be due to locale:
% echo "+00\n-0c\n+02\n-02" | LC_ALL=C src/sort
+00
+02
-02
-0c
echo "+00\n-0c\n+02\n-02" | LC_ALL=en_US.UTF-8 src/sort
+00
-02
+02
-0c
Since OS X 10.11 still comes with coreutils 5.93, I tried that:
% echo "+00\n-0c\n+02\n-02" | LC_ALL=en_US.UTF-8 sort
+00
+02
-02
-0c
I've taken a look at the Unicode collation standard, and I can't
immediately see anything that explains the current (8.25) behavior.
I've also played around with <
http://demo.icu-project.org/icu-bin/locexp?_=en_US.UTF-8&d_=en&x=col> and I
can't come up with any set of Unicode collation options that gives the same
results.
mathew
- bug#24601: UTF-8 locale makes lexicographic sort weird,
mathew <=