|
From: | The Wanderer |
Subject: | Re: Apparently irrational behaviour in sort |
Date: | Sun, 04 Dec 2005 11:56:21 -0500 |
User-agent: | Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050922 |
Andreas Schwab wrote:
The Wanderer <address@hidden> writes:I have a text file whose lines each contain two dates, of the format "MM-DD-YYYY". I want to sort these lines into order from oldest to most recent - that is, first by YYYY, then by MM, then by DD. After I parse out the spaces using sed, the only whitespace remaining in the file is the blocks of contiguous tabs which I use to divide the columns which contain the actual data; the date I want to sort on is in the fifth such column, which should at that point be field number 5. I would expect that 'sort -k 5.7,5 -k 5,5' would sort first by the seventh character in the fifth field (the first digit of the year), thus putting the file in order by year, and then by the entirety of the fifth field, thus putting the file in order by month and day. It does not.This is because the whitespace between the fields belongs to the following field, ie. the field boundary is the point between a non-whitespace and a whitespace character. Thus you either have to increase the offset by the number of whitespace characters at the start of the field (if it is consistent), or use the b flag to skip whitespace altogether.
Hah. That did, in fact, do it. (The whitespace is not and cannot be consistent in length, because the entire point of it is to align the data - not all of which is of fixed length - into neat columns.) I would suggest that it might be a good idea to provide a more detailed explanation of this behaviour in the documentation. From the man page, I did not even figure out that "the whitespace between the fields belongs to the following field", much less exactly what the -b option would do. A more in-depth explanation would have saved me hours of yanking at my hair in frustration. If the somewhat abbreviated nature of the man page is intentional, then at the least it might be a good idea to provide an explanation in the online coreutils FAQ. I might be interested, at least academically, in the reasoning behind the decision to make the whitespace part of the following field rather than either the preceding one or neither; do you have any idea when the discussion which led to that decision might have taken place? Regardless, thank you very much for the answer, simple though it might be. -- The Wanderer Warning: Simply because I argue an issue does not mean I agree with any side of it. Secrecy is the beginning of tyranny.
[Prev in Thread] | Current Thread | [Next in Thread] |