[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 2/2] failover: don't allow to migrate a paused VM that needs
From: |
Juan Quintela |
Subject: |
Re: [PATCH 2/2] failover: don't allow to migrate a paused VM that needs PCI unplug |
Date: |
Tue, 02 Nov 2021 16:28:13 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) |
"Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Sep 29, 2021 at 04:43:11PM +0200, Laurent Vivier wrote:
>> As the guest OS is paused, we will never receive the unplug event
>> from the kernel and the migration cannot continue.
>>
>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>
> Well ... what if user previously did
>
> pause
> start migration
> unpause
>
> we are breaking it now for no good reason.
No. we are canceling the migration. Migration can not finish on that
state. We are inside the test:
if (migration_in_setup(s) && !should_be_hidden) {
If you don't have any really weird setup[1], migration setup just takes
milliseconds (low units for small guest, and 200-300ms for really huge
ones).
So I still think this is right.
1: Weird here means things like RDMA, locking all the memory of one
guest can take forever. To get an idea about this, until we
introduce RDMA, we didn't meassured the setup stage time, because it
was so small that it didn't matter at all.
Unplug from guest is other operation that can take quite a long time,
because it depends on guest cooperation.
> Further, how about
>
> start migration
> pause
>
> are we going to break this too? by failing pause?
I haven't thougth about this one, but it shouldn't matter (famous last
words), beacuse there are to cases:
- migration has started and unplug has already finished, no problem.
- migration has started but we haven't yet arrived to
virtio_net_handle_migration_primary(). We are paused, and we give the
guest a good error message about why are we failing. notice that
migration can't finish anyways, it would stuck there forever waiting
for the (stopped guest to unplug the device).
So the only case that I can see that *could* matter is:
- start migration
- pause the guest
this implies pausing the migration
- unpause
at this point we can continue the migration
do we really care about this scenary?
I think not, because the migration has advanced so few, that starting
from zero would be the best option anyways.
Later, Juan.
PD1: No, I am not sure what happens if you run "pause" after the event
to guest is sent, but before that the guest finish the unplug (I
guess it would stall). But in this case, we are doing something at
least fishy. On the other hand, we know that "pause; migration"
will never really work.
PD2: Perhaps we could "invet" another state that means:
IN_SETUP_AND_WE_CANT_BE_PAUSED, and change it between we ask for the
device to unplug, and that it unplugs. But it looks really complicated.