bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bug#6327: sort fails on some UTF-8 input


From: Eric Blake
Subject: Re: bug#6327: sort fails on some UTF-8 input
Date: Wed, 02 Jun 2010 08:40:19 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100430 Fedora/3.0.4-3.fc13 Lightning/1.0b2pre Mnenhy/0.8.2 Thunderbird/3.0.4

[adding gnulib]

On 06/01/2010 10:51 PM, River Tarnell wrote:
> I'm using coreutils 8.5 on Solaris 10.
> 
> GNU 'sort' fails to sort some input, while Solaris 'sort' handles it
> correctly:
> 
> willow% /opt/ts/gnu/bin/sort sort_test.txt 
> /opt/ts/gnu/bin/sort: string comparison failed: Illegal byte sequence
> /opt/ts/gnu/bin/sort: Set LC_ALL='C' to work around the problem.
> /opt/ts/gnu/bin/sort: The strings compared were
> `\360\222\203\276\360\222\205\226' and
> `\360\222\200\255\360\222\213\253\360\222\213\253\360\222\200\255'.

Thanks for the report.  What locale are you using (that is, the entire
output of 'locale')?  I could not reproduce failure using:

$ export LC_ALL; for f in $(locale -a); do LC_ALL=$f || continue;
    sort sort_test.txt >/dev/null || { echo $f; break; }; done

on a GNU/Linux system with 732 installed locales.  But it is highly
likely that you could be in a non-UTF-8 locale, or that the Solaris
multibyte functions are not as robust as glibc at detecting valid UTF-8
sequences.  If it is indeed a bug in Solaris strcoll(), then gnulib can
probably be taught to work around it.

-- 
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]