[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BUG] Bash not reacting to Ctrl-C

From: Linus Torvalds
Subject: Re: [BUG] Bash not reacting to Ctrl-C
Date: Mon, 28 Feb 2011 22:28:07 -0800

On Mon, Feb 28, 2011 at 6:20 PM, Chet Ramey <address@hidden> wrote:
> The patch looks good.  I'll take a closer look and probably produce a
> patch for bash-4.2 based on it.  Thanks for taking a look.

So I think that Oleg Nesterov is correct in that the -1 return with
errno==EINTR will never actually trigger, because it is re-tried by
the loop in "waitchld()".

So I don't think my patch is really doing what it _intends_ to do.

So in wait_for() (after my patch):

          r = waitchld (pid, 1);
          if (r == -1 && errno == EINTR && wait_sigint_received)
              child_blocked_sigint = 1;

that child_blocked_sigint may never actually trigger. Which certainly
explains why I couldn't reproduce the lost ^C any more: it really
fixes the race condition, but it does it by basically never
considering a child to block ^C.

So I think the attached patch is better - it moves all the
child_blocked_sigint logic down into waitchld() itself, so that it can
really see the right error values.

HOWEVER! When I do this, I can see the "lost ^C" issue again. It seems
to be a bit harder than before, but I have a very hard time really
judging it, it's subjective. But I get traces like this:

  22:13:02.434759 rt_sigaction(SIGINT, {0x43c980, [], SA_RESTORER,
0x301f833140}, {SIG_DFL, [
  22:13:02.434790 wait4(-1, 0x7fffaa90befc, 0, NULL) = ? ERESTARTSYS
(To be restarted)
  22:13:02.434945 --- SIGINT (Interrupt) @ 0 (0) ---
  22:13:02.434957 rt_sigreturn(0x2)       = -1 EINTR (Interrupted system call)
  22:13:02.434980 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}],
0, NULL) = 4623
  22:13:02.435005 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER,
0x301f833140}, {0x43c980, [
  22:13:02.435041 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
  22:13:02.435063 --- SIGCHLD (Child exited) @ 0 (0) ---
  22:13:02.435074 wait4(-1, 0x7fffaa90ba5c, WNOHANG, NULL) = -1 ECHILD
(No child processes)
  22:13:02.435095 rt_sigreturn(0xffffffffffffffff) = 0

ie we hit the exact case where the ^C happens just as the child is
exiting, so the child has already done the "exit(0)" system call, but
the exit() takes long enough that bash has time to react to the ^C and
see that EINTR.

Note the timing: the SIGINT happens at 02.434945, with the wait
returning successfully at 02.434980. We're talking microseconds, but
the whole "fork+exec+wait" things are all very fast. So it's just the
luck of the draw just where the ^C happens.

I can't get it to happen reliably, but it is not entirely rare either.
It probably needs a lot of luck, and SMP with some timing bad luck to
trigger. And it probably helps that the Linux fork/exec cost is quite
low, so being able to hit it just at the exit() is easier.

Anyway - to recap: looking at EINTR isn't sufficient, and doesn't
close the race. It's _really_ hard to try to decide on the ambiguous
case of "child exited without errors on its own just as ^C came in"
and "child blocked ^C and exited afterwards without errors".

Anybody have any other heuristics to try to disambiguate the two
cases? If we really are talking about "interactive programs that catch
^C" here, then it is possible that the only real heuristic is one that
is based on time. How _long_ did it take for the process to exit? If
the process exits without WIFSIGNALED a long time after ^C (where
"long" is obviously only in computer terms), then we might assume it
really blocked it and considered it actual input.

I don't much like the idea of time-based heuristics, but right now I
think bash resolves the ambiguity the wrong way around: bash prefers
to err on the "drop ^C" side, even though it's likely to be the rare
case. A real interactive program that uses ^C (like an editor) isn't
actually ever going to see a SIGINT _at_all_, since it will set the
tty state to -isig, and actually read the ^C as the character '\003'
rather than have any SIGINT issues). I dunno.


Attachment: patch.diff
Description: Text Data

reply via email to

[Prev in Thread] Current Thread [Next in Thread]