parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: parallel bug: Warning: No more file handles. when one job is delayed


From: Ole Tange
Subject: Re: parallel bug: Warning: No more file handles. when one job is delayed (reproducible with a test case)
Date: Sat, 28 Oct 2017 00:28:02 +0200

On Fri, Oct 27, 2017 at 9:23 AM, Shlomi Fish <shlomif@shlomifish.org> wrote:

> Thanks for your work.

Good to know it is appreciated.

> Attached are two files to reproduce a bug I ran into with GNU parallel
> including the latest one on Mageia v7 x86-64:
>
> shlomif@telaviv1:~$ bash run-range.bash
> parallel: Warning: No more file handles.
> parallel: Warning: Raising ulimit -n or /etc/security/limits.conf may help.
> ^CCompleted!
>
> The run-single.bash script is delayed for n=1 and meanwhile other jobs
> accumulate which may explain the problem. This problem caused me to lose one
> night of uptime on an AWS instance because "parallel" got stuck, so I'd
> appreciate an investigation and a fix.

Your problem can be illustrated with:

  seq 0 1000 | parallel -k -t sleep '{= $_ = $_ ? 0 : 10 =};echo {}'

This will run 'sleep 10' followed by 1000 jobs of 'sleep 0'. -t causes
the command to be printed as soon as it is started.

Because of -k GNU Parallel must keep the order of the output. It does
that by having open files to the temporary output files of jobs run.
What happens here, is that before we can close any of the files, we
will have to wait for the first job to complete. Because the other
jobs are very fast to complete, then GNU Parallel runs out of file
handles, and thus warns you:

  parallel: Warning: No more file handles.
  parallel: Warning: Raising ulimit -n or /etc/security/limits.conf may help.

But it is just a warning: As soon as the first job completes, it
completes the remaining jobs.

> Also see https://lists.gnu.org/archive/html/parallel/2017-07/msg00006.html .

If you use -k in that, then we have the explanation: GNU Parallel does
not stop. It waits for one of the jobs to complete before it can close
more filehandles.


/Ole



reply via email to

[Prev in Thread] Current Thread [Next in Thread]