qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/1] migration: Terminate multifd threads on yank


From: Lukas Straub
Subject: Re: [PATCH 1/1] migration: Terminate multifd threads on yank
Date: Tue, 3 Aug 2021 06:41:27 +0000

On Fri, 30 Jul 2021 04:40:45 -0300
Leonardo Bras <leobras@redhat.com> wrote:

> From source host viewpoint, losing a connection during migration will
> cause the sockets to get stuck in sendmsg() syscall, waiting for
> the receiving side to reply.
> 
> In migration, yank works by shutting-down the migration QIOChannel fd.
> This causes a failure in the next sendmsg() for that fd, and the whole
> migration gets cancelled.
> 
> In multifd, due to having multiple sockets in multiple threads,
> on a connection loss there will be extra sockets stuck in sendmsg(),
> and because they will be holding their own mutex, there is good chance
> the main migration thread can get stuck in multifd_send_pages()
> waiting for one of those mutexes.
> 
> While it's waiting, the main migration thread can't run sendmsg() on
> it's fd, and therefore can't cause the migration to be cancelled, thus
> causing yank not to work.
> 
> Fixes this by shutting down all migration fds (including multifd ones),
> so no thread get's stuck in sendmsg() while holding a lock, and thus
> allowing the main migration thread to properly cancel migration when
> yank is used.
> 
> There is no need to do the same procedure to yank to work in the
> receiving host since ops->recv_pages() is kept outside the mutex protected
> code in multifd_recv_thread().
> 
> Buglink:https://bugzilla.redhat.com/show_bug.cgi?id=1970337
> Reported-by: Li Xiaohui <xiaohli@redhat.com>
> Signed-off-by: Leonardo Bras <leobras@redhat.com>
> ---

Hi,
There is an easier explanation: I forgot the send side of multifd
altogether (I thought it was covered by migration_channel_connect()).
So yank won't actually shutdown() the multifd sockets on the send side.

In the bugreport you wrote
> (As a test, I called qio_channel_shutdown() in every multifd iochannel and 
> yank worked just fine, but I could not retry migration, because it was still 
> 'ongoing')
That sounds like a bug in the error handling for multifd. But quickly
looking at the code, it should properly fail the migration.

BTW: You can shutdown outgoing sockets from outside of qemu with the
'ss' utility, like this: 'sudo ss -K dst <destination ip> dport = <destination 
port>'

Regards,
Lukas Straub

Attachment: pgpLkfACw1nu6.pgp
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]