qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 0/3] python/machine.py: refactor shutdown


From: Philippe Mathieu-Daudé
Subject: Re: [PATCH v3 0/3] python/machine.py: refactor shutdown
Date: Wed, 17 Jun 2020 19:13:00 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0

On 6/16/20 11:49 PM, Cleber Rosa wrote:
> On Mon, Jun 15, 2020 at 05:21:18PM +0200, Philippe Mathieu-Daudé wrote:
>> On 6/9/20 11:55 PM, John Snow wrote:
>>>
>>>
>>> On 6/9/20 9:08 AM, Philippe Mathieu-Daudé wrote:
>>>> Hi John,
>>>>
>>>> On 6/4/20 9:52 PM, John Snow wrote:
>>>>> v3:
>>>>>  - Split _post_shutdown refactor into own patch (now 1/3)
>>>>>  - Re-add sigkill warning squelch (now 3/3)
>>>>>
>>>>> NOTE: I re-added the squelch in its own patch for review purposes, but
>>>>> for the purposes of avoiding temporary breakage, a maintainer may wish
>>>>> to squash patches 2 and 3 if they are accepted.
>>>>>
>>>>> v2: Philippe took patches 1, 3 and 4.
>>>>>
>>>>> This is a re-write of what was:
>>>>> [PATCH RFC 03/32] python//machine.py: remove bare except
>>>>> [PATCH 2/4] python/machine.py: remove bare except
>>>>>
>>>>> It's a bit heavier handed, but it should address some of kwolf's
>>>>> feedback from the RFC version.
>>>>>
>>>>> Applies straight to origin/master, ought to pass pylint and flake8:
>>>>>
>>>>>> cd ~/src/qemu/python/qemu
>>>>>> pylint *.py
>>>>>> flake8 *.py
>>>>>
>>>>> John Snow (3):
>>>>>   python/machine.py: consolidate _post_shutdown()
>>>>>   python/machine.py: refactor shutdown
>>>>>   python/machine.py: re-add sigkill warning suppression
>>>>>
>>>>>  python/qemu/machine.py | 100 +++++++++++++++++++++++++++++------------
>>>>>  1 file changed, 71 insertions(+), 29 deletions(-)
>>>>>
>>>>
>>>> I'm now seeing this error:
>>>>
>>>> 21:31:58 DEBUG| / # reboot
>>>> 21:32:01 DEBUG| / # reboot: Restarting system
>>>> 21:32:01 DEBUG| >>> {'execute': 'quit'}
>>>> 21:32:01 WARNI| qemu received signal 9; command:
>>>> "mips-softmmu/qemu-system-mips -display none -vga none -chardev
>>>> socket,id=mon,path=/tmp/tmp679upvrk/qemu-10292-monitor.sock -mon
>>>> chardev=mon,mode=control -machine malta -chardev
>>>> socket,id=console,path=/tmp/tmp679upvrk/qemu-10292-console.sock,server,nowait
>>>> -serial chardev:console -kernel
>>>> /tmp/avocado_b3aaagr9/avocado_job_5bj0xe1h/12-tests_acceptance_boot_linux_console.py_BootLinuxConsole.test_mips_malta_cpio/boot/vmlinux-4.5.0-2-4kc-malta
>>>> -initrd
>>>> /tmp/avocado_b3aaagr9/avocado_job_5bj0xe1h/12-tests_acceptance_boot_linux_console.py_BootLinuxConsole.test_mips_malta_cpiorootfs.cpio
>>>> -append printk.time=0 console=ttyS0 console=tty rdinit=/sbin/init
>>>> noreboot -no-reboot"
>>>> 21:32:01 ERROR|
>>>> 21:32:01 ERROR| Reproduced traceback from:
>>>> /home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6/site-packages/avocado/core/test.py:886
>>>> 21:32:01 ERROR| Traceback (most recent call last):
>>>> 21:32:01 ERROR|   File
>>>> "/home/travis/build/philmd/qemu/build/tests/acceptance/avocado_qemu/__init__.py",
>>>> line 195, in tearDown
>>>> 21:32:01 ERROR|     vm.shutdown()
>>>> 21:32:01 ERROR|   File
>>>> "/home/travis/build/philmd/qemu/python/qemu/machine.py", line 449, in
>>>> shutdown
>>>> 21:32:01 ERROR|     self._do_shutdown(has_quit)
>>>> 21:32:01 ERROR|   File
>>>> "/home/travis/build/philmd/qemu/python/qemu/machine.py", line 426, in
>>>> _do_shutdown
>>>> 21:32:01 ERROR|     self._soft_shutdown(has_quit, timeout)
>>>> 21:32:01 ERROR|   File
>>>> "/home/travis/build/philmd/qemu/python/qemu/machine.py", line 413, in
>>>> _soft_shutdown
>>>> 21:32:01 ERROR|     self._qmp.cmd('quit')
>>>> 21:32:01 ERROR|   File
>>>> "/home/travis/build/philmd/qemu/python/qemu/qmp.py", line 271, in cmd
>>>> 21:32:01 ERROR|     return self.cmd_obj(qmp_cmd)
>>>> 21:32:01 ERROR|   File
>>>> "/home/travis/build/philmd/qemu/python/qemu/qmp.py", line 249, in cmd_obj
>>>> 21:32:01 ERROR|     
>>>> self.__sock.sendall(json.dumps(qmp_cmd).encode('utf-8'))
>>>> 21:32:01 ERROR| BrokenPipeError: [Errno 32] Broken pipe
>>>> 21:32:01 ERROR|
>>>> 21:32:01 DEBUG| DATA (filename=output.expected) => NOT FOUND (data
>>>> sources: variant, test, file)
>>>> 21:32:01 DEBUG| DATA (filename=stdout.expected) => NOT FOUND (data
>>>> sources: variant, test, file)
>>>> 21:32:01 DEBUG| DATA (filename=stderr.expected) => NOT FOUND (data
>>>> sources: variant, test, file)
>>>> 21:32:01 DEBUG| Not logging /var/log/syslog (lack of permissions)
>>>> 21:32:01 ERROR| ERROR
>>>> 12-tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_mips_malta_cpio
>>>> -> TestSetupFail: [Errno 32] Broken pipe
>>>> 21:32:01 INFO |
>>>>
>>>> https://travis-ci.org/github/philmd/qemu/jobs/696142277#L5329
>>>>
>>>
>>> Gotcha.
>>>
>>> The problem here is that `test_mips_malta_cpio` in boot_linux_console.py
>>> does this:
>>>
>>>         self.vm.add_args('-kernel', kernel_path,
>>>                          '-initrd', initrd_path,
>>>                          '-append', kernel_command_line,
>>>                          '-no-reboot')
>>>
>>> and then:
>>>
>>> exec_command_and_wait_for_pattern(self, 'reboot',
>>>                          'reboot: Restarting system')
>>>
>>> and (in avocado_qemu/) __init__.py does this:
>>>
>>>     def tearDown(self):
>>>         for vm in self._vms.values():
>>>             vm.shutdown()
>>>
>>>
>>>
>>> What's happening here is that we are instructing QEMU to *close* when
>>> the guest reboots instead of allowing it to reboot. Then, we are issuing
>>> a reboot command to the guest, which will effectively terminate QEMU as
>>> well. Finally, we are trying to send a shutdown command to QEMU, but
>>> QEMU has already gone.
>>>
>>> Now, in the shutdown code, we do make an attempt to catch this:
>>>
>>> def is_running(self):
>>>     """Returns true if the VM is running."""
>>>     return self._popen is not None and self._popen.poll() is None
>>>
>>> But, well, race conditions.
>>>
>>> When we make it here:
>>>
>>>         if self._qmp is not None:
>>>             if not has_quit:
>>>                 self._qmp.cmd('quit')
>>>             self._qmp.close()
>>>
>>> We believe we are running and we believe we have an open QMP socket.
>>> Attempting to engage the socket by sending 'quit' causes the error.
>>>
>>> It's a tight window: if quit happens earlier, we send the command
>>> successfully and everything's OK. If quit happens later, we realize QEMU
>>> isn't running and proceed to cleanup.
>>
>> Nice debugging :)
>>
>>>
>>> Ultimately:
>>>
>>> - Avocado should not try to shut down QEMU twice, but
>>> - machine.py shouldn't enable the race condition either.
>>>
>>>
>>>
>>> for my part, how about this:
>>>
>>> diff --git a/python/qemu/machine.py b/python/qemu/machine.py
>>> index 99bcb499878..813f8e477db 100644
>>> --- a/python/qemu/machine.py
>>> +++ b/python/qemu/machine.py
>>> @@ -385,7 +385,14 @@ def _soft_shutdown(self, has_quit: bool = False,
>>> timeout: int = 3) -> None:
>>>
>>>          if self._qmp is not None:
>>>              if not has_quit:
>>> -                self._qmp.cmd('quit')
>>> +                try:
>>> +                    self._qmp.cmd('quit')
>>> +                except (BrokenPipeError, ConnectionResetError):
>>> +                    # QMP went away just before or just after sending
>>> 'quit'
>>> +                    if not self.is_running():
>>> +                        # "Mission Accomplished"
>>> +                        pass
>>> +                    raise
>>
>> Looks OK to me, Cleber/Eduardo can you Ack?
>>
> 
> John's description seems spot on, and so does the solution.  I was
> unable, though, to reproduce it given that the window is indeed tiny.
> 
> Phil,
> 
> In addition to this, wouldn't it make sense to:
> 
> diff --git a/tests/acceptance/boot_linux_console.py 
> b/tests/acceptance/boot_linux_console.py
> index 3d02519660..e117597e3b 100644
> --- a/tests/acceptance/boot_linux_console.py
> +++ b/tests/acceptance/boot_linux_console.py
> @@ -180,8 +180,8 @@ class BootLinuxConsole(LinuxKernelTest):
>                                 + 'rdinit=/sbin/init noreboot')
>          self.vm.add_args('-kernel', kernel_path,
>                           '-initrd', initrd_path,
> -                         '-append', kernel_command_line,
> -                         '-no-reboot')
> +                         '-append', kernel_command_line)

Very few machines implement the hardware shutdown (sending
the SHUTDOWN_CAUSE_GUEST_SHUTDOWN event). AFAIK The
SHUTDOWN_CAUSE_GUEST_RESET is a bit more present.
When the hardware doesn't model this, the guest loops
infinitely. Some physical machines are not supposed to
power off, until cutting its power. So firmware also loops
without trying to power things off.

'-no-reboot' means "if the guest tries to reboot, consider
the emulation done".

> +
>          self.vm.launch()
>          self.wait_for_console_pattern('Boot successful.')
>  
> @@ -189,8 +189,8 @@ class BootLinuxConsole(LinuxKernelTest):
>                                                  'BogoMIPS')
>          exec_command_and_wait_for_pattern(self, 'uname -a',
>                                                  'Debian')
> -        exec_command_and_wait_for_pattern(self, 'reboot',
> -                                                'reboot: Restarting system')
> +        exec_command_and_wait_for_pattern(self, 'halt',
> +                                                'reboot: System halted')
>  
>      @skipUnless(os.getenv('AVOCADO_ALLOW_UNTRUSTED_CODE'), 'untrusted code')
>      def test_mips64el_malta_5KEc_cpio(self):
> 
> Do you know of any reason to "fake halt" (reboot) instead of actually
> halting from the guests?

This might be my embedded system reflex (explained earlier, some
hardware can not power itself down). I will give your diff a try.

> 
>>>              self._qmp.close()
>>>
>>>          self._popen.wait(timeout=timeout)
>>>
>>
> 
> Either way (^ with this extra block):
> 
> Reviewed-by: Cleber Rosa <crosa@redhat.com>
> Tested-by: Cleber Rosa <crosa@redhat.com>

Thanks!




reply via email to

[Prev in Thread] Current Thread [Next in Thread]