Re: --linebuffer implemented

parallel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: --linebuffer implemented

From:	Ole Tange
Subject:	Re: --linebuffer implemented
Date:	Sat, 14 Jun 2014 14:16:56 +0200

On Tue, Jul 30, 2013 at 5:20 PM, Ole Tange <tange@gnu.org> wrote:

> --linebuffer guards against having half a line coming from job1 and
> half a line coming from job2, but prints out when it has a full line.
> It is slower than --group and much slower than --ungroup.

Alas, I was wrong. This is what happens when your intuition tells you
something and you do not measure to check if it is true.

My intuition says: Polling for a new line is bound to be slower than
being able to read the whole file in big chunks.

But reality is more complex: In CPU intensive jobs with little output
this will be true (because polling costs some CPU power), but it is
not necessarily for jobs with lots of output:

  yes 1234567890123456789012345678901234567890 | head -n 100000000

generates around 4 GB of output.

On a system with 4 GB RAM this:

  time parallel yes {} '| head -n 100000000' :::
1234567890123456789012345678901234567890 | wc -c

takes:

  real    2m58.825s
  user    1m12.148s
  sys     0m8.472s

But with --line-buffer:

  time parallel --line-buffer yes {} '| head -n 100000000' :::
1234567890123456789012345678901234567890 | wc -c

it takes:

  real    2m4.762s
  user    1m13.888s
  sys     0m7.948s

In the first example the time is spent writing 4 GB to disk and
reading 4 GB back from disk. In the second example the 4 GB is written
to disk, but it is never read: --line-buffer manages to read
everything from RAM cache. The extra polling costs us 1.7s CPU time -
time well spent if the overall saving is 54s.

In other words: --line-buffer can sometimes be faster, and if your
post-processing program (in my case 'wc -c') reads a line at a time
and prefers getting data streamed instead of in chunks, then that may
speed up your processing even further. If data is compressible
--compress can also speed up processing. The fastest is still
--ungroup, but that may mix half-lines from different jobs.

Lesson re-learned: Don't trust your intuition, if you can measure.

/Ole

[Prev in Thread]

Current Thread

[Next in Thread]

Re: --linebuffer implemented, Ole Tange <=

Prev by Date: Re: Huge consumption of tmpdir while running parallel
Next by Date: GNU Parallel 20140622 ('Brazil') released
Previous by thread: Huge consumption of tmpdir while running parallel
Next by thread: GNU Parallel 20140622 ('Brazil') released
Index(es):
- Date
- Thread