[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: What is the 'associated field'? (about sort)
From: |
Eric Blake |
Subject: |
Re: What is the 'associated field'? (about sort) |
Date: |
Tue, 05 Jul 2011 08:30:22 -0600 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110428 Fedora/3.1.10-1.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.10 |
On 07/01/2011 07:22 PM, Peng Yu wrote:
> Hi,
>
> The following explanation for coreutils manual is not very clear.
>
> "Also note that the ‘n’ modifier was applied to the field-end
> specifier for the first key. It
> would have been equivalent to specify ‘-k 2n,2’ or ‘-k 2n,2n’. All
> modifiers except ‘b’
> apply to the associated field, regardless of whether the modifier
> character is attached
> to the field-start and/or the field-end part of the key specifier."
Maybe it also helps to read the POSIX wording for this same feature:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html
The keydef argument is a restricted sort key field definition. The
format of this definition is:
field_start[type][,field_end[type]]
where field_start and field_end define a key field restricted to a
portion of the line (see the EXTENDED DESCRIPTION section), and type is
a modifier from the list of characters 'b' , 'd' , 'f' , 'i' , 'n' , 'r'
. The 'b' modifier shall behave like the -b option, but shall apply only
to the field_start or field_end to which it is attached. The other
modifiers shall behave like the corresponding options, but shall apply
only to the key field to which they are attached; they shall have this
effect if specified with field_start, field_end, or both. If any
modifier is attached to a field_start or to a field_end, no option shall
apply to either.
>
> According to the manual and the following output, '-k 1,2n' is the
> same as '-k 1n,2' and '-k 1n,2n'. But isn't this syntax a little
> confusing? Shouldn't '-k 1n,2n' be the same as '-k1,1n -k2,2n'?
No. '-k 1n,2' says to treat the combination of fields 1 and 2 as a
single numeric string, and is generally not what you want. Meanwhile,
'-k 1n,1 -2n,2' says to treat both field 1 and field 2 as numeric
strings, where field 2 is used to break ties when field 1 compares equal.
>
> Also I don't understand what "associated field" refers to?
The "associated field" is the -k1,1 portion. Most letters can be
written on the start, end, or both positions of the -k1,1 field, at
which point that entire key takes on that option letter. But b is
special, in that ignoring blanks of just the start or just the end makes
sense, so it only applies to the half of the associated -k1,1 field
where the b appears.
Perhaps you might gain further understanding of this by using the
--debug option.
>
>
>> cat input1.txt
> 1 10
> 1 9
>> sort --key=1,2n input1.txt
> 1 10
> 1 9
$ printf '1 10\n1 9\n' | LC_ALL=C sort --debug -k1,2n
sort: using simple byte comparison
sort: key 1 is numeric and spans multiple fields
1 10
_
____
1 9
_
___
Here, -k1,2n means to sort the single key comprised of fields 1 and 2 as
a number (but the number necessarily ends at the end of field 1), with a
fall-back sort to the lexicographical sort of the entire line. '9' >
'1' lexicographically, even though "10" > "9" numerically.
>> sort --key=1n,2n input1.txt
> 1 10
> 1 9
>> sort --key=1,1n --key=2,2n input1.txt
> 1 9
> 1 10
That's better - you have now separated the two numeric keys, as
evidenced by --debug not warning you about spanning multiple fields:
$ printf '1 10\n1 9\n' | LC_ALL=C ../coreutils/src/sort --debug -k1,1n
-k2,2n
../coreutils/src/sort: using simple byte comparison
1 9
_
_
___
1 10
_
__
____
--
Eric Blake address@hidden +1-801-349-2682
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature