Re: [coreutils] added ability in sort to skip n number of lines for each

coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [coreutils] added ability in sort to skip n number of lines for each

From:	Pádraig Brady
Subject:	Re: [coreutils] added ability in sort to skip n number of lines for each file
Date:	Tue, 23 Nov 2010 16:21:07 +0000
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100227 Thunderbird/3.0.3

On 23/11/10 15:57, Jim Hester wrote:
> Below I have an updated proper patch, it is quite a bit larger than my
> first, but should address all of the concerns from Assaf and Pádraig.
> 
> My main motivation here is not just to make this common operation less
> annoying, it was mostly for increased performance.  I made a test
> dataset of 10 files with 3 header lines each and 500,000 lines to sort,
> then ran sort by using head and tail as Pádraig suggests, and then again
> using my implemented header skip on an 8 core machine.  Larger files
> seem to show similar speed up as well.  I believe this speedup comes
> from the fact that the multithreaded sort is trying to read from the
> buffer faster than tail can write to the buffer.
> 
>>time { (head -q -n 3 test[0-9] | head -n 3; tail -q -n+4 test[0-9] |
> ./sort -n ) > out2; }
> 
> real    0m51.660s
> user    2m0.324s
> sys     0m4.115s
> 
>>time ./sort -n -l 3 test[0-9] > out
> 
> real    0m31.834s
> user    2m17.775s
> sys     0m3.981s
>>diff out out2

The user time from the head;tail|sort
is lower than sort -l which suggests that
the first invocation was just waiting on disk?

Could you please repeat the test using precached data?

Currently the threads in `sort` are passed data that is read
sequentially from input files (as otherwise `sort`
would have to start worrying about device ids,
and /sys/block/<blockdev>/queue/rotational etc.
so as to not thrash disk heads). That kind of
logic is probably always best outside of `sort`.

cheers,
Pádraig.

[Prev in Thread]

Current Thread

[Next in Thread]

[coreutils] added ability in sort to skip n number of lines for each file, Jim Hester, 2010/11/18
- Re: [coreutils] added ability in sort to skip n number of lines for each file, Pádraig Brady, 2010/11/22
  - Re: [coreutils] added ability in sort to skip n number of lines for each file, Pádraig Brady, 2010/11/22
    - Re: [coreutils] added ability in sort to skip n number of lines for each file, Pádraig Brady, 2010/11/22
    - Re: [coreutils] added ability in sort to skip n number of lines for each file, Jim Hester, 2010/11/23
    - Re: [coreutils] added ability in sort to skip n number of lines for each file, Pádraig Brady <=
- Re: [coreutils] added ability in sort to skip n number of lines for each file, Assaf Gordon, 2010/11/22
  - Re: [coreutils] added ability in sort to skip n number of lines for each file, Assaf Gordon, 2010/11/22

Prev by Date: RE: [coreutils] coredump segmentation fault using coreutils 6.4 sparc solaris using mv or touch
Next by Date: Re: [coreutils] [PATCH] head: optionally indicate underrun of set limit
Previous by thread: Re: [coreutils] added ability in sort to skip n number of lines for each file
Next by thread: Re: [coreutils] added ability in sort to skip n number of lines for each file
Index(es):
- Date
- Thread