qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC 0/2] Fix migration issues


From: Peter Xu
Subject: Re: [Qemu-devel] [PATCH RFC 0/2] Fix migration issues
Date: Wed, 24 Oct 2018 22:27:42 +0100
User-agent: Mutt/1.10.1 (2018-07-13)

On Mon, Oct 22, 2018 at 07:08:52PM +0800, Fei Li wrote:
> Hi,
> these two patches are to fix live migration issues. The first is
> about multifd, and the second is to fix some error handling.
> 
> But I have a question about using multifd migration.
> In our current code, when multifd is used during migration, if there
> is an error before the destination receives all new channels (I mean
> multifd_recv_new_channel(ioc)), the destination does not exit but
> keeps waiting (Hang in recvmsg() in qio_channel_socket_readv) until
> the source exits.
> 
> My question is about the state of the destination host if fails during
> this period. I did a test, after applying [1/2] patch, if
> multifd_new_send_channel_async() fails, the destination host hangs for
> a while then later pops up a window saying
>     "'QEMU (...) [stopped]' is not responding.
>     You may choose to wait a short while for it to continue or force
>     the application to quit entirely."
> But after closing the window by clicking, the qemu on the dest still
> hangs there until I exclusively kill the qemu on the source.
> 
> The source host keeps running as expected, but I guess the hang
> phenonmenon in the dest is not right.
> Would someone kindly give some suggestions on this? Thanks a lot.

Note that it's during KVM forum so the response from anyone might be
slow (it ends this week).

I think the thing you described seems normal since we can't guarantee
the network is always stable, normally I'll expect that the migration
will fail but it won't matter much since after all it's a precopy so
we lose nothing.  So I'm curious about when the error you mentioned
happens (e.g., total channel number is N, you only got M channels
connected, with M < N) could you just simply kill the destination?
Then AFAIU the source can just continue to run, right?

> 
> 
> Fei Li (2):
>   migration: fix the multifd code
>   migration: fix some error handling
> 
>  migration/migration.c    |  5 +----
>  migration/postcopy-ram.c |  3 +++
>  migration/ram.c          | 33 +++++++++++++++++++++++----------
>  migration/ram.h          |  2 +-
>  4 files changed, 28 insertions(+), 15 deletions(-)
> 
> -- 
> 2.13.7
> 

Regards,

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]