[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [coreutils] added ability in sort to skip n number of lines for each

From: Pádraig Brady
Subject: Re: [coreutils] added ability in sort to skip n number of lines for each file
Date: Mon, 22 Nov 2010 22:21:32 +0000
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/20100227 Thunderbird/3.0.3

On 22/11/10 17:28, Pádraig Brady wrote:
> On 18/11/10 16:36, Jim Hester wrote:
>> A common problem when sorting files stems from the file containing 1
>> or more header lines, which should not be sorted.  As of now, the
>> common solution to this problem is to remove the header lines with
>> manually, or to output only the non header lines with tail, awk, or
>> some other program and pipe the results to sort.
> Thanks for the patch!
>> This was likely not
>> deemed a problem when sort was only single threaded, as the printing
>> and pipe was likely still faster than the sort itself.  However with
>> multi-threaded sort this results in the operation bottle necking
>> waiting for more information from the pipe.
> I'm not following the argument above.
> One can always print the header synchronously?
> I.E. the `head` below is guaranteed to run before the `sort`
> printf "z_header\nb\na\n" > file
> (head -n1 file; sort <(tail -n+2 file) <(tail -n+2 file))
> Now the above is awkward and dependent on bash
> (constructs per file), so your idea has some merit I think.

Note the --header option is especially useful for `join`
as it transforms its input, however sort does not and
so might be amenable to a more general solution.
Perhaps something like:

(head --no-header -n1 file.* | head -n1; tail --no-header -n+2 file.* | sort)

I.E. add the --no-header option to suppress the ==> file name <== annotations
which would allow using `head` and `tail` in general for this.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]