bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#15450: SORT failing on some lines


From: Eric Blake
Subject: bug#15450: SORT failing on some lines
Date: Wed, 25 Sep 2013 13:41:52 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8

tag 15450 -moreinfo
tag 15450 +notabug
thanks

On 09/25/2013 12:28 PM, address@hidden wrote:
> 
> Hello Eric,
> Thank you kindly for your speedy reply.
> I should apologize for the lack of information included with my email.
> It was a hurried one.

Re-adding the list for closure, with permission.

> 
> In fact your suggestions and link and a bit of tinkering have cured the
> problem. SORT works fine it seems. I should have had more faith.
> The problem was purely with Locale, which I read up on in the FAQ link
> you sent. I had looked at Locale previously but didn't seem to have any
> success with it. I had also been trying various options for SORT,
> including -i, -d and even the field separation. (-t'#' -k1,1) I didn't
> have any luck but I realized after reading through your reply that it
> was the combination of these things which hadn't come right.
> 
> I'd just like to add here for anybody else who stumbles across this same
> problem, a description of the problem I was having in more detail (now
> solved)
> 
> The text file was a 605MB list of title texts extracted from Wikipedia,
> separated by a #--# and followed by the 'long long' integer offsets of
> where the article appeared in the dump file. (XML)
> Example lines:
> 
> Alps Electric#--#7701298893,12,24,364,394,420
> Alps Electric Co.#--#4280442890,12,28,339,3144,3170
> Alps Electric Corporation#--#9562165739,12,36,447,477,503
> 
> My machine was set to en-GB locale, although I had switched this to
> en-US with same (wrong) results.
> 
> It was necessary to set the locale to LC_ALL=C and also to instruct SORT
> only to look at the first field (up to the first #) using the -t'#' and
> -k1,1 switches as you mentioned.
> Obvious really, but the combination of the two is what caused my confusion.
> 
> It is really worth reading up on Locale for anybody using SORT and other
> utilities as it can profoundly change the results of an operation.
> Even setting locale to en-US doesn't help, as I read in the FAQ you
> linked, because en-US quite drastically reduces sort possibilities
> (case, punctuation etc ignored)
> 
> I'm sorry for the bother - but you put me on the right track.
> Many thanks for that.

Glad to hear it.  As such, I've closed the bug in the tracker.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]