bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#19142: sort not working with LANG set to language_country.encoding


From: Bob Proulx
Subject: bug#19142: sort not working with LANG set to language_country.encoding
Date: Fri, 21 Nov 2014 22:49:41 -0700
User-agent: Mutt/1.5.23 (2014-03-12)

tag 19142 notabug
close 19142
thanks

Roland Sieker wrote:
> I have noticed that sort seems to have problems when the LANG environment
> variable is set with language and country.

Sort is definitely affected by LANG because LANG sets LC_COLLATE which
controls the collation sequence.  Different locales have different
collating sequences.  I don't like that the english locales such as my
own country's en_US.UTF-8 and others like en_GB.UTF-8 don't sort
"correctly" as far as I am concerned but I can only accept it.  Sort
order is actually a libc function and affects much more than sort.  It
also affects ls and the shell and basically everything on the system
that sorts.

> It sorts OK like this, with LANG just the language.encoding:
> ( setenv LANG en.UTF-8 ; echo 'a\nb\na\n⺌\n⺕\n⺌' | sort )
> a
> a
> b

Are you sure "en.UTF-8" is a valid locale?  It doesn't look like it to
me.  I think that is an invalid locale and therefore libc is falling
back to the C/POSIX locale.

> But not with LANG as language_country.encoding:
> ( setenv LANG en_GB.UTF-8 ; echo 'a\nb\na\n⺌\n⺕\n⺌' | sort )

Here "en_GB.UTF-8" is a valid domain and en_GB.UTF-8 uses dictionary
sort ordering.  Dictionary order folds case and ignores punctuation.

Try using the newish sort --debug option.  It will help debug problems
such as this.

  $ printf "a\nb\na\n⺌\n⺕\n⺌\n" | env LC_ALL=en_US.UTF-8 sort --debug
  sort: using ‘en_US.UTF-8’ sorting rules
  ...

  $ printf "a\nb\na\n⺌\n⺕\n⺌\n" | env LC_ALL=en.UTF-8 sort --debug
  sort: using simple byte comparison
  ...

See also the FAQ entry:

  
https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#Sort-does-not-sort-in-normal-order_0021

Bob





reply via email to

[Prev in Thread] Current Thread [Next in Thread]