bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 'sort' bug


From: Bob Proulx
Subject: Re: 'sort' bug
Date: Fri, 30 May 2008 00:01:50 -0600
User-agent: Mutt/1.5.13 (2006-08-11)

Mike Markowski wrote:
> I think I've come across a bug in 'sort'.  Using the attached file (please 
> let me know if the attachment is stripped from this email), I tried to sort 
> on the 5th column of states/countries by using:
> 
>    sort -k 5 c3
> 
> The first few lines look like:
> 
>    10-Apr-2008  W7GVE        729C Ed         AZ             10.120
>    18-May-2008  W1GUE        1998 Ed         NH              7.055
>    28-Apr-2008  KG4W        2416T Ed         VA              7.055
>    11-May-2008  K4ZGB        796T Tom        AL              7.055
>    16-May-2008  9A2VJ        2533 Vel        CROATIA        14.052
>    [...]
> 
> already not properly sorted by state/country.

I think you have missed that unless you specify -b that spaces are
part of each field.

  `-t SEPARATOR'
  `--field-separator=SEPARATOR'
       Use character SEPARATOR as the field separator when finding the
       sort keys in each line.  By default, fields are separated by the
       empty string between a non-blank character and a blank character.
       That is, given the input line ` foo bar', `sort' breaks it into
       fields ` foo' and ` bar'.  The field separator is not considered
       to be part of either the field preceding or the field following,
       so with `sort -t " "' the same input line has three fields: an
       empty field, `foo', and `bar'.  However, fields that extend to the
       end of the line, as `-k 2', or fields consisting of a range, as
       `-k 2,3', retain the field separators present between the
       endpoints of the range.

Therefore because "Ed" is one character shorter than "Tom" there is
one more space for those three lines than for all later lines.  The
fields being sorted are:

  "         AZ             10.120"
  "         NH              7.055"
  "         VA              7.055"
  "        AL              7.055"
  "        CROATIA        14.052"
  
That should illustrate the issue.  The resulting order is correct as
it has been specified.  Also you probably want to end the sort string
as well.  Because as you can see -k5 is sorting from there to the end
of line.  Meaning that with -b -k5 you are sorting on these strings:

  "AZ             10.120"
  "NH              7.055"
  "VA              7.055"
  "AL              7.055"
  "CROATIA        14.052"

But with -b -k5,5 you would be sorting upon these strings:

  "AZ"
  "NH"
  "VA"
  "AL"
  "CROATIA"

That is probably what you want.  Also you may or may not want the -s
option to disable the last-resort comparison of the entire line.

> Yet, doing:
> 
>    sort -n -k 6 c3
> 
> works as expected.

Numeric sorting skips leading spaces just like -b does when doing a
character sort.

Bob




reply via email to

[Prev in Thread] Current Thread [Next in Thread]