[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PULL 16/45] vl: Add option to avoid stopping VM upon guest panic

From: Alejandro Jimenez
Subject: Re: [PULL 16/45] vl: Add option to avoid stopping VM upon guest panic
Date: Wed, 20 Jan 2021 00:28:14 -0500
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.1

On 1/19/2021 4:34 PM, Peter Maydell wrote:
On Tue, 15 Dec 2020 at 18:11, Paolo Bonzini <pbonzini@redhat.com> wrote:
From: Alejandro Jimenez <alejandro.j.jimenez@oracle.com>

The current default action of pausing a guest after a panic event
is received leaves the responsibility to resume guest execution to the
management layer. The reasons for this behavior are discussed here:

However, in instances like the case of older guests (Linux and
Windows) using a pvpanic device but missing support for the
PVPANIC_CRASHLOADED event, and Windows guests using the hv-crash
enlightenment, it is desirable to allow the guests to continue
running after sending a PVPANIC_PANICKED event. This allows such
guests to proceed to capture a crash dump and automatically reboot
without intervention of a management layer.

Add an option to avoid stopping a VM after a panic event is received,
by passing:

-action panic=none

in the command line arguments, or during runtime by using an upcoming
QMP command.
Hi. This commit message doesn't say it's changing the default
action, but the change does:

@@ -3899,6 +3899,8 @@ DEF("action", HAS_ARG, QEMU_OPTION_action,
      "                   action when guest reboots [default=none]\n"
      "-action shutdown=poweroff|pause\n"
      "                   action when guest shuts down [default=poweroff]\n"
+    "-action panic=poweroff|pause|none\n"
+    "                   action when guest panics [default=poweroff]\n"
      "-action watchdog=reset|shutdown|poweroff|inject-nmi|pause|debug|none\n"
      "                   action when watchdog fires [default=reset]\n",
  RebootAction reboot_action = REBOOT_ACTION_NONE;
  ShutdownAction shutdown_action = SHUTDOWN_ACTION_POWEROFF;
+PanicAction panic_action = PANIC_ACTION_POWEROFF;
We used to default to 'pause' and now we default to 'poweroff'.
Hi Peter.

My rationale for setting the panic action to 'poweroff' was to keep the default behavior of QEMU when '-no-shutdown' is not specified, and a panic occurs. I believe that in order to accomplish that, the default panic action should still be 'poweroff', but as you point out there is an instance where the behavior changes. Specifically, when '-no-shutdown' is not used there is now one fewer QMP event issued when a guest panic is detected, before stopping the VM and powering off.

I tried to account for this scenario in the original patches, but I failed to catch the problem after the rebase when the changes were merged. I'll test and send a fix for this issue in the next few days.

We noticed this because it broke an in-flight test case for
the pvpanic-pci device from Mihai (which was expecting to see
the device in 'pause' state and found it was now in 'poweroff').
The test is just checking for the arrival of the QMP event, and not actually expecting the VM to be paused, correct? I see that if a test/management app is expecting to receive a GUEST_PANICKED event with the specific 'pause' action, then it might be confused. But any such tests would only be able to check for the arrival of the QMP event, and not actually expect to issue any commands to a paused VM, since the next block of code in QEMU immediately powers off and shutdowns when '-no-shutdown' is not requested. This was the typical behavior before the patches.

Test cases aren't very exciting, but was it really intentional
to change the default behaviour?
My intention was to preserve the default behavior. Perhaps Paolo wanted to reduce the number of GUEST_PANICKED events by removing the one with 'pause' action? You could consider it superfluous since it is immediately followed by another indicating the 'poweroff' action... Unless I hear otherwise from either of you, I'll work on a fix to keep the same number and type of events sent.

Thank you,

  It's part of the user-facing
surface of QEMU, so if we did intend a default change that ought
really to be more clearly stated (and noted in the Changelog) I think.

-- PMM

reply via email to

[Prev in Thread] Current Thread [Next in Thread]