qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH V2 00/11] Live update: cpr-exec (reconnections)


From: Steven Sistare
Subject: Re: [PATCH V2 00/11] Live update: cpr-exec (reconnections)
Date: Tue, 20 Aug 2024 12:28:39 -0400
User-agent: Mozilla Thunderbird

On 8/13/2024 4:12 PM, Peter Xu wrote:
On Wed, Aug 07, 2024 at 03:47:47PM -0400, Steven Sistare wrote:
On 8/4/2024 12:10 PM, Peter Xu wrote:
On Sat, Jul 20, 2024 at 05:26:07PM -0400, Steven Sistare wrote:
On 7/18/2024 11:56 AM, Peter Xu wrote:
[...]
Lastly, there is no loss of connectivity to the guest,
because chardev descriptors remain open and connected.

Again, I raised the question on why this would matter, as after all mgmt
app will need to coop with reconnections due to the fact they'll need to
support a generic live migration, in which case reconnection is a must.

So far it doesn't sound like a performance critical path, for example, to
do the mgmt reconnects on the ports.  So this might be an optimization that
most mgmt apps may not care much?

Perhaps.  I view the chardev preservation as nice to have, but not essential.
It does not appear in this series, other than in docs.  It's easy to implement
given the CPR foundation.  I suggest we continue this discussion when I post
the chardev series, so we can focus on the core functionality.

It's just that it can affect our decision on choosing the way to go.

For example, do we have someone from Libvirt or any mgmt layer can help
justify this point?

As I said, I thought most facilities for reconnection should be ready, but
I could miss important facts in mgmt layers..

I will more deeply study reconnects in the mgmt layer, run some experiments to
see if it is seamless for the end user, and get back to you, but it will take
some time.

See below.

[...]
Could I ask what management code you're working on?  Why that management
code doesn't need to already work out these problems with reconnections
(like pre-CPR ways of live upgrade)?

OCI - Oracle Cloud Infrastructure.
Mgmt needs to manage reconnections for live migration, and perhaps I could
leverage that code for live update, but happily I did not need to.  Regardless,
reconnection is the lesser issue.  The bigger issue is resource management and
the container environment.  But I cannot justify that statement in detail 
without
actually trying to implement cpr-transfer in OCI.

[...]

The use case is the same for both modes, but they are simply different
transport methods for moving descriptors from old QEMU to new.  The developer
of the mgmt agent should be allowed to choose.

It's out of my capability to review the mgmt impact on this one.  This is
all based on the idea that I think most mgmt apps supports reconnections
pretty well. If that's the case, I'd definitely go for the transfer mode.

Closing the loop here on reconnections --

The managers I studied do not reconnect QEMU chardevs such as the guest console
after live migration.  In all cases, the old console goes dark and the user must
manually reconnect to the console on the target.

OCI does not auto reconnect.  libvirt does not, one must reconnect through 
libvirtd
on the target. kubevirt does not AFAICT; one must reconnect on the target using
virtctl console.

Thus chardev preservation does offer an improved user experience in this regard.
chardevs can be preserved using either cpr-exec or cpr-transfer.  But, if QEMU
runs in a containerized environment that has agents that proxy connections 
between
QEMU chardevs and the outside world, then only cpr-exec (which preserves the 
existing
container) preserves connections end-to-end.  OCI has such agents.  I believe 
kubevirt
does also.

- Steve



reply via email to

[Prev in Thread] Current Thread [Next in Thread]