qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging


From: Daniel P . Berrangé
Subject: Re: [PATCH 0/5] Introduce 'yank' oob qmp command to recover from hanging qemu
Date: Mon, 11 May 2020 12:49:47 +0100
User-agent: Mutt/1.13.4 (2020-02-15)

On Mon, May 11, 2020 at 01:14:34PM +0200, Lukas Straub wrote:
> Hello Everyone,
> In many cases, if qemu has a network connection (qmp, migration, chardev, 
> etc.)
> to some other server and that server dies or hangs, qemu hangs too.

If qemu as a whole hangs due to a stalled network connection, that is a
bug in QEMU that we should be fixing IMHO. QEMU should be doing non-blocking
I/O in general, such that if the network connection or remote server stalls,
we simply stop sending I/O - we shouldn't ever hang the QEMU process or main
loop.

There are places in QEMU code which are not well behaved in this respect,
but many are, and others are getting fixed where found to be important.

Arguably any place in QEMU code which can result in a hang of QEMU in the
event of a stalled network should be considered a security flaw, because
the network is untrusted in general.

> These patches introduce the new 'yank' out-of-band qmp command to recover from
> these kinds of hangs. The different subsystems register callbacks which get
> executed with the yank command. For example the callback can shutdown() a
> socket. This is intended for the colo use-case, but it can be used for other
> things too of course.

IIUC, invoking the "yank" command unconditionally kills every single
network connection in QEMU that has registered with the "yank" subsystem.
IMHO this is way too big of a hammer, even if we accept there are bugs in
QEMU not handling stalled networking well.

eg if a chardev hangs QEMU, and we tear down everything, killing the NBD
connection used for the guest disk, we needlessly break I/O.

eg doing this in the chardev backend is not desirable, because the bugs
with hanging QEMU are typically caused by the way the frontend device
uses the chardev blocking I/O calls, instead of non-blocking I/O calls.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]