qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 1/1] osdep: asynchronous teardown for shutdown on Linux


From: Halil Pasic
Subject: Re: [PATCH v1 1/1] osdep: asynchronous teardown for shutdown on Linux
Date: Tue, 7 Dec 2021 15:59:04 +0100

On Mon, 6 Dec 2021 11:47:55 +0000
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Mon, Dec 06, 2021 at 12:43:12PM +0100, Claudio Imbrenda wrote:
> > On Mon, 6 Dec 2021 11:21:10 +0000
> > Daniel P. Berrangé <berrange@redhat.com> wrote:
> >   
> > > On Mon, Dec 06, 2021 at 12:06:11PM +0100, Claudio Imbrenda wrote:  
> > > > This patch adds support for asynchronously tearing down a VM on Linux.
> > > > 
> > > > When qemu terminates, either naturally or because of a fatal signal,
> > > > the VM is torn down. If the VM is huge, it can take a considerable
> > > > amount of time for it to be cleaned up. In case of a protected VM, it
> > > > might take even longer than a non-protected VM (this is the case on
> > > > s390x, for example).
> > > > 
> > > > Some users might want to shut down a VM and restart it immediately,
> > > > without having to wait. This is especially true if management
> > > > infrastructure like libvirt is used.
> > > > 
> > > > This patch implements a simple trick on Linux to allow qemu to return
> > > > immediately, with the teardown of the VM being performed
> > > > asynchronously.
> > > > 
> > > > If the new commandline option -async-teardown is used, a new process is
> > > > spawned from qemu using the clone syscall, so that it will share its
> > > > address space with qemu.
> > > > 
> > > > The new process will then wait until qemu terminates, and then it will
> > > > exit itself.
> > > > 
> > > > This allows qemu to terminate quickly, without having to wait for the
> > > > whole address space to be torn down. The teardown process will exit
> > > > after qemu, so it will be the last user of the address space, and
> > > > therefore it will take care of the actual teardown.
> > > > 
> > > > The teardown process will share the same cgroups as qemu, so both
> > > > memory usage and cpu time will be accounted properly.    
> > > 
> > > If this suggested workaround has any benefit to the shutdown of a VM
> > > with libvirt, then it is a bug in libvirt IMHO.
> > > 
> > > When libvirt tears down a QEMU VM, it should be waiting for *every*
> > > process in the VM's cgroup to be terminated before it reports that
> > > the VM is shutoff. IOW, the fact that this workaround lets the main
> > > QEMU process exit quickly should not matter. libvirt should still
> > > be blocked in exactly the same place in its code, waiting for the
> > > "async" cleanup process to exit. IOW, this should not be async at
> > > all from libvirt's POV.  
> > 
> > interesting, I did not know that about libvirt.
> > 
> > maybe libvirt could be fixed/improved to allow this patch to work?  
> 
> That would not be desirable. When libvirt reports a VM as shutoff,
> it is expected that all resources associated with the VM huave been
> fully released, such that they are available for launching a new
> VM.  We can't allow resources to be asynchronously released as that
> violates app's expectation that the resources are released once the
> VM is shutoff.

I do see your point. But I believe, a part of the problem is that
currently 'can start VM again' and 'all resources associated with
the previous run of the VM were cleaned up' are tied together. And
intuitively it makes a ton of sense. It is just that due to certain
reasons complete shutdown with complete cleanup takes way too long. So
we are looking for a solution to decouple the two. I believe complete
cleanup is inherently hard, so we should not hope solving that. Do you
agree?

Under the assumption that we won't be able to make the cleanup (making
all the resources available again) fast enough, I believe the only way
forward is coming up with a solution if the user explicitly says so
the assumption you just laid out should not be justified any  more.
Maybe something like enlightening the management software about this
'non-interchangeable resources are released, interchangeable resources
not yet fully released' state and add something like a 'force-start' 
operation, where the user explicitly opts-into potentially consuming more
resources (because of the overlap) for less downtime.

What do you think?

> 
> > surely without this patch an asynchronous teardown will not be possible
> > at all  
> 
> I appreciate that the current slow teardown is a pain, but async
> teardown does not sound like an appealing alternative given that
> the app can't use the resources again until the teardown is
> complete.

I don't fully agree with this. I think this statement disregards that
some resources are non-interchangeable in a sense that we need the exact
same resource free, while other resources are interchangeable in a sense
that we don't care which instance we get as long as we get enough
instances from a certain class. When we stop a VM and then start the same
VM again, we don't expect to get the same chunks of memory we had before,
but we just allocate new memory.

Yes we may run into trouble, but we may not. As long as we don't just
change this under the hood, but make it an option that somebody has
to consciously choose to enable, I think we are fine.

Regards,
Halil



reply via email to

[Prev in Thread] Current Thread [Next in Thread]