bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#6327: sort fails on some UTF-8 input


From: Paul Eggert
Subject: bug#6327: sort fails on some UTF-8 input
Date: Wed, 02 Jun 2010 12:37:58 -0700
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100423 Thunderbird/3.0.4

On 06/01/2010 09:51 PM, River Tarnell wrote:
> I'm using coreutils 8.5 on Solaris 10.
> 
> GNU 'sort' fails to sort some input, while Solaris 'sort' handles it
> correctly:

Amusingly enough, on that same test case I found the same problem
with GNU 'sort' that you did, but I also found that Solaris 'sort'
reports that it runs out of memory, even in 64-bit mode.  For example:

1010-kiwi $ LC_ALL=en_CA.UTF-8 /usr/bin/sparcv9/sort sort_test.txt 
sort: insufficient memory; use -S option to increase allocation
1011-kiwi $ LC_ALL=en_CA.UTF-8 coreutils-8.5/src/sort sort_test.txt
coreutils-8.5/src/sort: string comparison failed: Illegal byte sequence
coreutils-8.5/src/sort: Set LC_ALL='C' to work around the problem.
coreutils-8.5/src/sort: The strings compared were 
`\360\222\203\276\360\222\205\226' and 
`\360\222\200\255\360\222\213\253\360\222\213\253\360\222\200\255'.

I expect that the exact failure mode probably depends on the
locale (and/or whether you're using x86 or sparc),
and that GNU 'sort' checks for strcoll failures but
Solaris 'sort' does not (and thus crashes).  If my guess is right,
this appears to be a bug in the Solaris strcoll implementation.
I don't see a simple workaround.  You might file a bug report
with Sun.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]