coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Sort option for Posix-locale simple comparisons


From: Eric Blake
Subject: Re: Sort option for Posix-locale simple comparisons
Date: Mon, 08 Apr 2013 09:00:29 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130402 Thunderbird/17.0.5

On 04/08/2013 01:27 AM, Ray Dillinger wrote:
> It turns out that 'sort' is grabbing locale information now and doing a
> locale-
> aware sort

Yes, this behavior has been required by POSIX for more than 20 years,
now (POSIX 1003.2-1992 was the first document that standardized this
behavior, and it standardized what was already existing practice at that
time).

> (hence failing to treat different lengths of blankspace
> differently
> and failing to treat any punctuation characters as significant -- at
> least in my
> case). 

Yes, this is one of the effects of 'sort' being required to do
locale-aware sorting.  In fact, it is a FAQ:
https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021

> There is a workaround; one can set the locale to 'C' or 'POSIX' directly
> in a
> script (or at the shell prompt) and then set it back after calling
> 'sort'.

That is not just a workaround, but the POSIX-mandated way to get sane
sorting results.  Script writers have been doing this for years.

>  But I
> dislike that workaround firstly because it complicates the writing of
> scripts
> adding boilerplate in many scripts that could be added instead just in
> 'sort'
> itself, secondly because I don't want to be mucking around with the locale
> from the command line, thirdly because that means people with other
> locales can't get error messages etc in their own languages if they're
> using
> a simplified sort, and fourth because there are too many ways it can fail.

You can still get error messages in a language you want, while still
collating in the C locale, by setting LC_COLLATE=C and leaving LC_ALL
unset.  But as to your dislike in using locale environment variables for
their intended purpose, you'll just have to get over that the way other
script writers have learned to do.

> So I decided it would be cleaner to hack a new command line option into
> 'sort' itself to explicitly invoke the simple traditional sorting behavior.
> Since 'c' and 'C' are already taken, I used the 'POSIX' locale instead
> of the
> 'C' locale, and gave it short option '-P' and long option '--posix-simple',
> with help string 'use POSIX locale (simple byte-value) comparisons.'

Thanks for trying to write a patch.  However, it is unlikely that we
will apply the patch, because the existing POSIX mandated-use of
LANG/LC_COLLATE/LC_ALL sufficiently exposes the knob in a portable
manner, while your option would only appear in GNU coreutils (and even
then, it would take a couple years before it hits all the distros you
are likely to use), and teaching people to rely on non-portable
extensions when a portable solution already exists is a bit
counterproductive.

> The diff is against the Debian distribution's coreutils-8.13 source code,

We prefer diffs against the latest coreutils.git, as the sources for
sort.c have changed since the last Debian release.

> I have attached the diff file.

Your diff file was in 'ed script' form (diff without options).  This
form is useless if the source has changed since when your patch was
written, since it contains no context on which lines were intended to be
changed.  Also, you attached the entire body of sort.c, which doesn't
really help us.  We prefer patches in unified form (diff -u), and can
also use patches in context form (diff -c), and with no repeat of sort.c.

Read HACKING for more details on the preferred way to supply a patch.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]