[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: --linebuffer implemented
From: |
Ole Tange |
Subject: |
Re: --linebuffer implemented |
Date: |
Sat, 14 Jun 2014 14:16:56 +0200 |
On Tue, Jul 30, 2013 at 5:20 PM, Ole Tange <tange@gnu.org> wrote:
> --linebuffer guards against having half a line coming from job1 and
> half a line coming from job2, but prints out when it has a full line.
> It is slower than --group and much slower than --ungroup.
Alas, I was wrong. This is what happens when your intuition tells you
something and you do not measure to check if it is true.
My intuition says: Polling for a new line is bound to be slower than
being able to read the whole file in big chunks.
But reality is more complex: In CPU intensive jobs with little output
this will be true (because polling costs some CPU power), but it is
not necessarily for jobs with lots of output:
yes 1234567890123456789012345678901234567890 | head -n 100000000
generates around 4 GB of output.
On a system with 4 GB RAM this:
time parallel yes {} '| head -n 100000000' :::
1234567890123456789012345678901234567890 | wc -c
takes:
real 2m58.825s
user 1m12.148s
sys 0m8.472s
But with --line-buffer:
time parallel --line-buffer yes {} '| head -n 100000000' :::
1234567890123456789012345678901234567890 | wc -c
it takes:
real 2m4.762s
user 1m13.888s
sys 0m7.948s
In the first example the time is spent writing 4 GB to disk and
reading 4 GB back from disk. In the second example the 4 GB is written
to disk, but it is never read: --line-buffer manages to read
everything from RAM cache. The extra polling costs us 1.7s CPU time -
time well spent if the overall saving is 54s.
In other words: --line-buffer can sometimes be faster, and if your
post-processing program (in my case 'wc -c') reads a line at a time
and prefers getting data streamed instead of in chunks, then that may
speed up your processing even further. If data is compressible
--compress can also speed up processing. The fastest is still
--ungroup, but that may mix half-lines from different jobs.
Lesson re-learned: Don't trust your intuition, if you can measure.
/Ole
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: --linebuffer implemented,
Ole Tange <=