[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Apparently irrational behaviour in sort

From: The Wanderer
Subject: Re: Apparently irrational behaviour in sort
Date: Sun, 04 Dec 2005 11:56:21 -0500
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922

Andreas Schwab wrote:

The Wanderer <address@hidden> writes:

I have a text file whose lines each contain two dates, of the
format "MM-DD-YYYY". I want to sort these lines into order from
oldest to most recent - that is, first by YYYY, then by MM, then by
DD. After I parse out the spaces using sed, the only whitespace
remaining in the file is the blocks of contiguous tabs which I use
to divide the columns which contain the actual data; the date I
want to sort on is in the fifth such column, which should at that
point be field number 5.

I would expect that 'sort -k 5.7,5 -k 5,5' would sort first by the
seventh character in the fifth field (the first digit of the year),
thus putting the file in order by year, and then by the entirety of
the fifth field, thus putting the file in order by month and day.
It does not.

This is because the whitespace between the fields belongs to the
following field, ie. the field boundary is the point between a
non-whitespace and a whitespace character.  Thus you either have to
increase the offset by the number of whitespace characters at the
start of the field (if it is consistent), or use the b flag to skip
whitespace altogether.

Hah. That did, in fact, do it. (The whitespace is not and cannot be
consistent in length, because the entire point of it is to align the
data - not all of which is of fixed length - into neat columns.)

I would suggest that it might be a good idea to provide a more detailed
explanation of this behaviour in the documentation. From the man page, I
did not even figure out that "the whitespace between the fields belongs
to the following field", much less exactly what the -b option would do.
A more in-depth explanation would have saved me hours of yanking at my
hair in frustration. If the somewhat abbreviated nature of the man page
is intentional, then at the least it might be a good idea to provide an
explanation in the online coreutils FAQ.

I might be interested, at least academically, in the reasoning behind
the decision to make the whitespace part of the following field rather
than either the preceding one or neither; do you have any idea when the
discussion which led to that decision might have taken place?

Regardless, thank you very much for the answer, simple though it might

      The Wanderer

Warning: Simply because I argue an issue does not mean I agree with any
side of it.

Secrecy is the beginning of tyranny.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]