[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Huge consumption of tmpdir while running parallel
From: |
Ole Tange |
Subject: |
Re: Huge consumption of tmpdir while running parallel |
Date: |
Sat, 14 Jun 2014 12:56:00 +0200 |
On Sat, Jun 14, 2014 at 2:13 AM, Antoine Drochon (perso)
<antoine@drochon.net> wrote:
> I am running into an disk space issue when I run a parallel command (GNU
> parallel 20140322).
>
> The pseudo code is as defined below:
Please do not use pseudo code, but make a working example that shows
the problem as per Reporting bugs in the man page:
Your bug report should always include:
· The error message you get (if any).
· The complete output of parallel --version. If
you are not running the latest released version
you should specify why you believe the problem
is not fixed in that version.
· A complete example that others can run that
shows the problem. This should preferably be
small and simple. A combination of yes, seq,
cat, echo, and sleep can reproduce most errors.
If your example requires large files, see if
you can make them by something like seq 1000000
> file or yes | head -n 10000000 > file. If
your example requires remote execution, see if
you can use localhost.
· The output of your example. If your problem is
not easily reproduced by others, the output
might help them figure out the problem.
· Whether you have watched the intro videos
(http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1),
walked through the tutorial (man
parallel_tutorial), and read the EXAMPLE
section in the man page (man parallel - search
for EXAMPLE:).
> The Bash script perform a dig command, some pure Bash instructions and write
> a single line of 50 to 100 characters to stdout.
Then that should never use GB of data on /tmp.
You can try using '--results outdir'. This will create the same files
in outdir as in /tmp, but will not remove them.
> I interrupted the execution and I assume Parallel trapped properly the signal
> to cleanup the temporary directory. I got back the 15 GB.
>
> Note: I was unable to see any temporary file in the tmpdir directory.
This is a feature: GNU Parallel uses tempfiles that are removed
immediately, but kept open. This way no matter how GNU Parallel may
die, the cleanup will be done by the OS. The unfortunate surprising
effect of this is that your disk may run full, but you cannot see any
files taking up the space.
> Any idea what could cause such a big temporary buffer output usage?
The only thing that comes to mind is if the output contains loads of
non-printable characters (e.g. \r or \0). With --results you should be
able to see how big the different files are for different arguments.
If you discover that the output is actually correct (and that it takes
up 15 GB), then --compress might help you.
/Ole