emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD


From: Philipp Stephani
Subject: Re: scratch/sigchld-fd 8f0ce42 1/2: Fix deadlock when receiving SIGCHLD during 'pselect'.
Date: Tue, 19 Jan 2021 21:22:04 +0100

Am Di., 19. Jan. 2021 um 20:14 Uhr schrieb Eli Zaretskii <eliz@gnu.org>:
>
> > From: Philipp Stephani <p.stephani2@gmail.com>
> > Date: Tue, 19 Jan 2021 19:21:39 +0100
> > Cc: Emacs developers <emacs-devel@gnu.org>, Philipp Stephani 
> > <phst@google.com>
> > > In general, pselect is supposed to return with EINTR when SIGCHLD
> > > happoens while we are inside the call to pselect, and EINTR seems to
> > > be already handled by wait_reading_process_output.  So I wonder why we
> > > need that additional "self-pipe" to be watched by pselect.
> >
> > Yes, I'm wondering about that as well, but it's definitely the
> > behavior I see. Before commiting to master, I ran the test
> > process-tests/fd-setsize-no-crash/make-process multiple times with and
> > without the commit, and the outcome was clear: without the commit
> > accept-process-output would frequently hang, with the commit it never
> > hangs.
> > This is pure speculation, but I could imagine multiple things going on:
> > - Maybe there's no guarantee that pselect actually returns EINTR on SIGCHLD.
> > - Maybe EINTR is returned too early, before the signal handler got the
> > chance to update the process status.
>
> I'd be happier if we had some direct evidence to these effects.  I'd
> also be surprised to hear that pselect doesn't return with EINTR when
> SIGCHLD comes in.  It is more likely that SIGCHLD is delivered before
> we call pselect, but if that is the case, we should be able to
> reliably detect that, I think.

So I've added a ton of logging to process.c, and the series of events
I observe (without the patch) is as follows (line numbers are
approximate due to the logging statements):

process.c:4729: Faccept_process_output: enter
process.c:5139: wait_reading_process_output: enter
process.c:5193: wait_reading_process_output: outer loop
process.c:5322: wait_reading_process_output: update_tick = 261,
process_tick = 261
process.c:5554: wait_reading_process_output: before pselect; max_desc = 1019
process.c:5601: wait_reading_process_output: after pselect: nfds = -1
process.c:5641: wait_reading_process_output: EINTR
process.c:5193: wait_reading_process_output: outer loop
process.c:5322: wait_reading_process_output: update_tick = 261,
process_tick = 261
process.c:7189: handle_child_signal: enter
process.c:7234: handle_child_signal: process test 5: change status to
0; new process_tick = 262
process.c:144: handle_child_signal: leave
process.c:5554: wait_reading_process_output: before pselect; max_desc = 1017

and then Emacs hangs.
So wait_reading_process_output indeed first receives EINTR, loops
around, and checks for the process_tick change before SIGCHLD is
handled. By the time it reruns pselect it's too late.

>
> > > In addition, AFAIU this pipe should not be needed on MS-Windows, where
> > > the pselect emulation waits on the sub-process handles together with
> > > the other file descriptors, and so gets awakened when a process exits
> > > or dies.  But again, without knowing the exact situations against
> > > which this changeset tries to protect, it is hard to make a decision.
> >
> > It's definitely not needed on Windows, which has a superior mechanism
> > anyway (process handles are waitable objects in Windows). I opted to
> > create the additional pipe on Windows as well - the costs should be
> > small, and it keeps the code more consistent between the operating
> > systems.
>
> The thing is, on Windows we can only wait on up to 64 handles (unless
> we complicate the code with multilevel wait, that is), so every
> unnecessary descriptor we need to wait on means we can support fewer
> simultaneous subprocesses.  We are already limited to just 32
> subprocesses, which is quite low a number.

OK, that's a good point, I didn't know about this limitation. I'll see
that I can remove the pipe on Windows.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]