Re: Huge consumption of tmpdir while running parallel

From: Ole Tange
Subject: Re: Huge consumption of tmpdir while running parallel
Date: Sat, 14 Jun 2014 12:56:00 +0200

On Sat, Jun 14, 2014 at 2:13 AM, Antoine Drochon (perso)
<> wrote:

> I am running into an disk space issue when I run a parallel command (GNU 
> parallel 20140322).
> The pseudo code is as defined below:

Please do not use pseudo code, but make a working example that shows
the problem as per Reporting bugs in the man page:

       Your bug report should always include:

       · The error message you get (if any).

       · The complete output of parallel --version. If
         you are not running the latest released version
         you should specify why you believe the problem
         is not fixed in that version.

       · A complete example that others can run that
         shows the problem. This should preferably be
         small and simple. A combination of yes, seq,
         cat, echo, and sleep can reproduce most errors.
         If your example requires large files, see if
         you can make them by something like seq 1000000
         > file or yes | head -n 10000000 > file. If
         your example requires remote execution, see if
         you can use localhost.

       · The output of your example. If your problem is
         not easily reproduced by others, the output
         might help them figure out the problem.

       · Whether you have watched the intro videos
         walked through the tutorial (man
         parallel_tutorial), and read the EXAMPLE
         section in the man page (man parallel - search
         for EXAMPLE:).

> The Bash script perform a dig command, some pure Bash instructions and write 
> a single line of 50 to 100 characters to stdout.

Then that should never use GB of data on /tmp.

You can try using '--results outdir'. This will create the same files
in outdir as in /tmp, but will not remove them.

> I interrupted the execution and I assume Parallel trapped properly the signal 
> to cleanup the temporary directory. I got back the 15 GB.
> Note: I was unable to see any temporary file in the tmpdir directory.

This is a feature: GNU Parallel uses tempfiles that are removed
immediately, but kept open. This way no matter how GNU Parallel may
die, the cleanup will be done by the OS. The unfortunate surprising
effect of this is that your disk may run full, but you cannot see any
files taking up the space.

> Any idea what could cause such a big temporary buffer output usage?

The only thing that comes to mind is if the output contains loads of
non-printable characters (e.g. \r or \0). With --results you should be
able to see how big the different files are for different arguments.

If you discover that the output is actually correct (and that it takes
up 15 GB), then --compress might help you.


