qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] failover: don't allow to migrate a paused VM that needs


From: Juan Quintela
Subject: Re: [PATCH 2/2] failover: don't allow to migrate a paused VM that needs PCI unplug
Date: Tue, 02 Nov 2021 16:28:13 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Sep 29, 2021 at 04:43:11PM +0200, Laurent Vivier wrote:
>> As the guest OS is paused, we will never receive the unplug event
>> from the kernel and the migration cannot continue.
>> 
>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>
> Well ... what if user previously did
>
> pause
> start migration
> unpause
>
> we are breaking it now for no good reason.

No.  we are canceling the migration.  Migration can not finish on that
state.  We are inside the test:

      if (migration_in_setup(s) && !should_be_hidden) {

If you don't have any really weird setup[1], migration setup just takes
milliseconds (low units for small guest, and 200-300ms for really huge
ones).

So I still think this is right.


1: Weird here means things like RDMA, locking all the memory of one
   guest can take forever.  To get an idea about this, until we
   introduce RDMA, we didn't meassured the setup stage time, because it
   was so small that it didn't matter at all.

Unplug from guest is other operation that can take quite a long time,
because it depends on guest cooperation.

> Further, how about
>
> start migration
> pause
>
> are we going to break this too? by failing pause?

I haven't thougth about this one, but it shouldn't matter (famous last
words), beacuse there are to cases:

- migration has started and unplug has already finished, no problem.

- migration has started but we haven't yet arrived to
  virtio_net_handle_migration_primary().  We are paused, and we give the
  guest a good error message about why are we failing.  notice that
  migration can't finish anyways, it would stuck there forever waiting
  for the (stopped guest to unplug the device).

So the only case that I can see that *could* matter is:

- start migration
- pause the guest
   this implies pausing the migration
- unpause
   at this point we can continue the migration

do we really care about this scenary?

I think not, because the migration has advanced so few, that starting
from zero would be the best option anyways.

Later, Juan.

PD1: No, I am not sure what happens if you run "pause" after the event
     to guest is sent, but before that the guest finish the unplug (I
     guess it would stall).  But in this case, we are doing something at
     least fishy.  On the other hand, we know that "pause; migration"
     will never really work.

PD2: Perhaps we could "invet" another state that means:
IN_SETUP_AND_WE_CANT_BE_PAUSED, and change it between we ask for the
device to unplug, and that it unplugs.  But it looks really complicated.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]