[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH V3 00/22] Live Update [restart] : exec
From: |
Daniel P . Berrangé |
Subject: |
Re: [PATCH V3 00/22] Live Update [restart] : exec |
Date: |
Mon, 14 Jun 2021 15:36:21 +0100 |
User-agent: |
Mutt/2.0.7 (2021-05-04) |
On Mon, Jun 14, 2021 at 10:31:32AM -0400, Steven Sistare wrote:
> On 6/7/2021 12:40 PM, Steven Sistare wrote:
> > On 6/3/2021 4:44 PM, Daniel P. Berrangé wrote:
> >> On Thu, Jun 03, 2021 at 08:36:42PM +0100, Dr. David Alan Gilbert wrote:
> >>> * Steven Sistare (steven.sistare@oracle.com) wrote:
> >>>> On 5/24/2021 6:39 AM, Dr. David Alan Gilbert wrote:
> >>>>> * Steven Sistare (steven.sistare@oracle.com) wrote:
> >>>>>> On 5/20/2021 9:13 AM, Dr. David Alan Gilbert wrote:
> >>>>>>> On the 'restart' branch of questions; can you explain,
> >>>>>>> other than the passing of the fd's, why the outgoing side of
> >>>>>>> qemu's 'migrate exec:' doesn't work for you?
> >>>>>>
> >>>>>> I'm not sure what I should describe. Can you be more specific?
> >>>>>> Do you mean: can we add the cpr specific bits to the migrate exec code?
> >>>>>
> >>>>> Yes; if possible I'd prefer to just keep the one exec mechanism.
> >>>>> It has an advantage of letting you specify the new command line; that
> >>>>> avoids the problems I'd pointed out with needing to change the command
> >>>>> line if a hotplug had happened. It also means we only need one chunk of
> >>>>> exec code.
> >>>>
> >>>> How/where would you specify a new command line? Are you picturing the
> >>>> usual migration
> >>>> setup where you start a second qemu with its own arguments, plus a
> >>>> migrate_incoming
> >>>> option or command? That does not work for live update restart; the old
> >>>> qemu must exec
> >>>> the new qemu. Or something else?
> >>>
> >>> The existing migration path allows an exec - originally intended to exec
> >>> something like a compressor or a store to file rather than a real
> >>> migration; i.e. you can do:
> >>>
> >>> migrate "exec:gzip > mig"
> >>>
> >>> and that will save the migration stream to a compressed file called mig.
> >>> Now, I *think* we can already do:
> >>>
> >>> migrate "exec:path-to-qemu command line parameters -incoming 'hmmmmm'"
> >>> (That's probably cleaner via the QMP interface).
> >>>
> >>> I'm not quite sure what I want in the incoming there, but that is
> >>> already the source execing the destination qemu - although I think we'd
> >>> probably need to check if that's actually via an intermediary.
> >>
> >> I don't think you can dirctly exec qemu in that way, because the
> >> source QEMU migration code is going to wait for completion of the
> >> QEMU you exec'd and that'll never come on success. So you'll end
> >> up with both QEMU's running forever. If you pass the daemonize
> >> option to the new QEMU then it will immediately detach itself,
> >> and the source QEMU will think the migration command has finished
> >> or failed.
> >>
> >> I think you can probably do it if you use a wrapper script though.
> >> The wrapper would have to fork QEMU in the backend, and then the
> >> wrapper would have to monitor the new QEMU to see when the incoming
> >> migration has finished/aborted, at which point the wrapper can
> >> exit, so the source QEMU sees a successful cleanup of the exec'd
> >> command. </hand waving>
> >
> > cpr restart does not work for any scheme that involves the old qemu process
> > co-existing with
> > the new qemu process. To preserve descriptors and anonymous memory, cpr
> > restart requires
> > that old qemu directly execs new qemu. Not fork-exec. Same pid.
> >
> > So responding to Dave's comment, "keep the one exec mechanism", that is not
> > possible.
> > We still need the qemu_exec_requested mechanism to cause a direct exec
> > after state is
> > saved.
> >
> >>>> We could shoehorn cpr restart into the migrate exec path by defining a
> >>>> new migration
> >>>> capability that the client would set before issuing the migrate command.
> >>>> However, the
> >>>> implementation code would be sprinkled with conditionals to suppress
> >>>> migrate-only bits
> >>>> and call cpr-only bits. IMO that would be less maintainable than having
> >>>> a separate
> >>>> cprsave function. Currently cprsave does not duplicate any migration
> >>>> functionality.
> >>>> cprsave calls qemu_save_device_state() which is used by xen.
> >>>
> >>> To me it feels like cprsave in particular is replicating more code.
> >>>
> >>> It's also jumping through hoops in places to avoid changing the
> >>> commandline; that's going to cause more pain for a lot of people - not
> >>> just because it's hacks all over for that, but because a lot of people
> >>> are actually going to need to change the commandline even in a cpr like
> >>> case (e.g. due to hotplug or changing something else about the
> >>> environment, like auth data or route to storage or networking that
> >>> changed).
> >>
> >> Management apps that already support migration, will almost certainly
> >> know how to start up a new QEMU with a different command line that
> >> takes account of hotplugged/unplugged devices. IOW avoiding changing
> >> the command line only really addresses the simple case, and the hard
> >> case is likely already solved for purposes of handling regular live
> >> migration.
> >
> > Agreed, with the caveat that for cpr, the management app must communicate
> > the new arguments
> > to the qemu-exec trampoline, rather than passing the args on the command
> > line to a new
> > qemu process.
> >
> >>> There are hooks for early parameter parsing, so if we need to add extra
> >>> commandline args we can; but for example the case of QEMU_START_FREEZE
> >>> to add -S just isn't needed as soon as you let go of the idea of needing
> >>> an identical commandline.
> >
> > I'll delete QEMU_START_FREEZE.
> >
> > I still need to preserve argv_main and pass it to the qemu-exec trampoline,
> > though, as
> > the args contain identifying information that the management app needs to
> > modify the
> > arguments based the the instances's hot plug history.
> >
> > Or, here is another possibility. We could redefine cprsave to leave the VM
> > in a
> > stopped state, and add a cprstart command to be called subsequently that
> > performs
> > the exec. It takes a single string argument: a command plus arguments to
> > exec.
> > The command may be qemu or a trampoline like qemu-exec. I like that the
> > trampoline
> > name is no longer hardcoded. The management app can derive new qemu args
> > for the
> > instances as it would with migration, and pass them to the command, instead
> > of passing
> > them to qemu-exec via some side channel. cprload finishes the job and does
> > not change.
> > I already like this scheme better.
>
> Or, pass argv as an additional parameter to cprsave.
>
> Daniel, David, do you like passing argv to cprsave or a new cprstart command
> better than the
> current scheme? I am ready to sent V4 of the series after we resolve this
> and the question of
> whether or not to fold cpr into the migration command.
I don't really have a strong opinion on this either way.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|