bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#7323: sort bug


From: Eric Blake
Subject: bug#7323: sort bug
Date: Wed, 03 Nov 2010 09:28:58 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Fedora/3.1.6-1.fc14 Mnenhy/0.8.3 Thunderbird/3.1.6

On 11/03/2010 08:52 AM, Thomas A Schweiger wrote:
> sort -t\| -k 10  source.dat

Most likely a bug in your usage, and not in sort.

> 
> 
> I get the following result:
> 
> 
> 7|1|1||MARY||JONES   |||19610202|||||| 1400 |N | eastwood | drive | || 
> |Fayetteville|AR| 72701 | ||| |||||TEST||
> 5|1|1||MARY||JONES   |||19610203|||||| 1400 |N | eastwood | drive | || 
> |Fayetteville|AR| 72701 | ||| |||||TEST||
> 9|1|1||Terry|a|Willis|||1961020|||||| 315 | E | Sutton | Street | || 
> |Fayetteville|AR| 72701 | ||| |||||TEST||
> 1|1|1||Terry|a|Willis|||19610203|||||| 315 | E | Sutton | Street | || 
> |Fayetteville|AR| 72701 | ||| |||||TEST||
> 3|1|1||Andy||smith   |||19610203|||||| 315 | | Willow | Street | || 
> |Fayetteville|AR| 72701| ||| |||||TEST||
> 2|1|1||Terry| |Willis|||19610204|||||| 315 | E | Sutton | Street | || 
> |Fayetteville|AR|72701 | ||| |||||TEST|| 
> 10|1|1||Robert|W|Travillian|||19610222|||||| 249 ||Murdoch|Street||||||51035| 
> ||| |||||TEST||
> 11|1|1||Robert|W|Travillian|||19610222|||||||||||||||| ||| |||||TEST||
> 4|1|1||Andy||smith   |||19610302|||||| 315 | | Willow | Street | || 
> |Fayetteville|AR| 72701 | ||| |||||TEST||
> 8|1|1||MARY||JONES   |||19615292|||||| 1400 |N | eastwood | drive | || 
> |Fayetteville|AR| 72701 | ||| |||||TEST||
> 6|1|1||MARY||JONES   |||19660203|||||| 1400 |N | eastwood | drive | || 
> |Fayetteville|AR| 72701 | ||| |||||TEST||
> 
> 
> Note in particular the location of record 9.

Where did you expect it to appear?  The latest coreutils 8.6 release
includes a --debug option that makes it more obvious what you did wrong
(I'm trimming down your example to a bare minimum):

$ printf '5|19610203|||||| 1400 |\n9|1961020|||||| 315
|\n1|19610203|||||| 315 |\n' | src/sort --debug -t\| -k2
src/sort: using `en_US.UTF-8' sorting rules
5|19610203|||||| 1400 |
 _____________________
______________________
9|1961020|||||| 315 |
 ___________________
____________________
1|19610203|||||| 315 |
 ____________________
_____________________

Notice that in the en_US.UTF-8 locale, punctuation does NOT affect
collation order.  And, since you explicitly requested that your key
start at field 10 and extend to the end of the line, 1961020315 (from
row 9) collates less than 19610203315 (from row 1).

But, if you instead require byte-wise sorting, and restrict your key to
JUST the field, you get results that I'm assuming you were expecting:

$ printf '5|19610203|||||| 1400 |\n9|1961020|||||| 315
|\n1|19610203|||||| 315 |\n' | LC_ALL=C src/sort --debug -t\| -k2,2
src/sort: using simple byte comparison
9|1961020|||||| 315 |
  _______
_____________________
1|19610203|||||| 315 |
  ________
______________________
5|19610203|||||| 1400 |
  ________
_______________________


> The information contained in this communication is confidential,

It is considered poor netiquette to send emails to publicly archived
lists with disclaimers like this, since the very nature of public
archival makes this clause unenforceable.  You are better off using a
secondary account that does not add your employer's disclaimer on the end.

-- 
Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]