qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] migration: Fix possible bug for migrate cancel


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH] migration: Fix possible bug for migrate cancel
Date: Fri, 28 Mar 2014 10:28:45 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0

Il 28/03/2014 10:18, Gonglei (Arei) ha scritto:
> > Can you please give more details at how you are triggering the problem
> > with libvirt?  I think Paolo is probably right - the bug is more likely
> > to be in libvirt not expecting the race and not recovering correctly
> > when the race occurs, than it is to be in changing qemu's state algorithm.
> >
When the migration progress reaches 100%, and the migration status becomes
MIG_STATE_COMPLETED in Qemu.
It will take some time which from MIG_STATE_COMPLETED to the migration
thread resources are recovered.
If we cancel the migration at this moment, the migrate_fd_cancel function will
break directly without reporting
error code. Then, libvirt considers the cancle operation a success, contrary
facts.

There is no error, once migration is completed you can still shutdown on the destination and continue on the source. Libvirt should either:

1) poll with "query-migrate" after migrate_cancel, and report an error there if it's the desired semantics;

2) toggle a "cancelled" flag before asking QEMU to cancel migration, check it in the migration functions after "query-migrate" reported completion; if it is true, do not resume on the destination.

Another reason for doing it in libvirt is that the serialization between cancellation and completion of migration ultimately is controlled by libvirt's lock. Doing this in QEMU makes it harder to reason about concurrency.

Paolo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]