[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#23012: add option to specific locale to sort
From: |
John Heidemann |
Subject: |
bug#23012: add option to specific locale to sort |
Date: |
Mon, 14 Mar 2016 11:02:55 -0700 |
Locale-specific sorting produces uprising results.
While locale-specific sorting is all as per POSIX, the details are
obscure and can be confusing.
(See for example this comment in the code:
/* Always output the locale in debug mode, since this
is such a common source of confusion. */
and "Sort does not sort in normal order!" at
http://www.gnu.org/software/coreutils/faq/coreutils-faq.html
)
Locale-specific result can only be controlled by setting the LC_LOCALE
or LC_COLLATE environment variables. However, this approach results in
"spooky action at a distance"---it is not obvious to users, and it can
be hard to control when sort is used from other programs.
Suggested enhancement: it should be possible to specify the locale on
the command-line, making control of this feature more accessible.
A patch at
http://www.isi.edu/~johnh/SOFTWARE/sort_locale_option_160314.patch
adds --locale=WHATEVER and -L
to accomplish this goal.
The patch is against coreutils-8.24.
Please consider it for submission to coreutils.
A test case that exhibits locale-specific oddness, with current sort:
{ echo '100.0.2'; echo '1.0.2'; echo '1x0.2'; echo 'the 1.0 is first as
Kerningham intended'; } |LC_COLLATE=C sort
{ echo '100.0.2'; echo '1.0.2'; echo '1x0.2'; echo 'the 100.0.2 is first, for
fun and confusion'; } |LC_COLLATE=en_US.utf8 sort
And the happeniess that ensues from control without environment variables:
{ echo '100.0.2'; echo '1.0.2'; echo '1x0.2'; echo 'the 1.0 is first as
Kerningham intended'; } |./sort --locale=C
{ echo '100.0.2'; echo '1.0.2'; echo '1x0.2'; echo 'the 100.0.2 is first, for
fun and confusion'; } | ./sort --locale=en_US.utf8
If the coreutils maintainers consider this patch acceptable, I will also
write a patch that updates the documentation.
(You may also want to add this option across other tools. That part is
left as an exercise to the reader. :-)
-John Heidemann
- bug#23012: add option to specific locale to sort,
John Heidemann <=