[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: busyloop in sigchld_handler

From: David Kastrup
Subject: Re: busyloop in sigchld_handler
Date: Wed, 14 Mar 2007 11:00:57 +0100
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux)

address@hidden (Kim F. Storm) writes:

> David Kastrup <address@hidden> writes:
>> Andreas Schwab <address@hidden> writes:
>>> David Kastrup <address@hidden> writes:
>>>> The CPU is claimed by the process with the loop, so no other process
>>>> may actually progress to a state which can be "wait"ed for.
>>> Of there is no child to be waited for then there is no loop.
>> In order to make sophistics solve the problem, you need to convince
>> the kernel.
> This happens in the sigchld handler - which is only invoked when there
> is a dead child (zombie) to "wait3" for - so we should not have to wait
> for the dead child to "really die".
> In addition, we call wait3 with WNOHANG, so it is not supposed to block
> if there are no dead childs.
> That why Andreas and I can't really see where the busy loop can
> happen, but since the loop _is_ observed, it is important to
> understand why it happens, not just install a "semi-random" patch
> which fixes the problem, but nobody can explain why.
> Perhaps we need to ask a Linux kernel hacker?
> Here's the code in condensed form:
>   while (1)
>     {
>       while (1)
>       {
>         errno = 0;
>         pid = wait3 (&w, WNOHANG | WUNTRACED, 0);
>         if (! (pid < 0 && errno == EINTR))
>           break;
>         /* Avoid a busyloop: wait3 is a system call, so we do not want
>            to prevent the kernel from actually sending SIGCHLD to emacs
>            by asking for it all the time.  */
>         sleep (1);
>       }
>       if (pid <= 0)
>               return;
>       /* handle death of child `pid' */
>     }
> So the problem is the interpretation of an EINTR error from
> wait3(..., WNOHANG, ...).
> The Linux man page says:
>        EINTR  if WNOHANG was not set and an unblocked signal or a SIGCHLD  was
>               caught.
> So WNOHANG => EINTR is not explained, but the usual meaning is that
> the wait3 was interrupted by some other signal - and if there is a
> loop, that signal is repeated somehow ...
> However, with the test code I inserted into the sigchld handler, and
> then executing M-x complile once after starting emacs -Q, it clearly
> shows that:
> a) the sigchld handler is entered exactly once.
> b) the first wait3 returns immediately with the pid
>    of the compile process,
> c) the next wait3 returns immediately with 0, since
>    there are no more processes to wait for.
> So where's the busy loop?
> The above code is the version for Linux - other variations of the code
> are used for other platform, but the OP said this was observed on a
> GNU/Linux system.

The signal manpage says:

        When a signal  occurs, and func points to  a function, it is
        implementation-defined whether the equivalent of a:

                signal(sig, SIG_DFL);

        is   executed   or    the   implementation   prevents   some
        implementation-defined  set of  signals (at  least including
        sig) from  occurring until  the current signal  handling has

So even though SIGCHLD may be interrupted by another signal, this does
not mean that the other signal handler gets a chance to run.

Maybe we should not loop, but instead rather return in the signal
handler, possibly reraising the signal?  That may give the system the
leeway to deal with whatever caused EINTR in the first place.

David Kastrup

reply via email to

[Prev in Thread] Current Thread [Next in Thread]