[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 0/7] migration: pause-before-device
From: |
Daniel P. Berrange |
Subject: |
Re: [Qemu-devel] [PATCH 0/7] migration: pause-before-device |
Date: |
Thu, 12 Oct 2017 11:02:44 +0100 |
User-agent: |
Mutt/1.9.0 (2017-09-02) |
On Wed, Oct 11, 2017 at 08:13:10PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <address@hidden>
>
> Hi,
> This set attempts to make a race condition between migration and
> drive-mirror (and other block users) soluble by allowing the migration
> to be paused after the source qemu releases the block devices but
> before the serialisation of the device state.
>
> The symptom of this failure, as reported by Wangjie, is a:
> _co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed
>
> and the source qemu dieing; so the problem is pretty nasty.
> This has only been seen on 2.9 onwards, but the theory is that
> prior to 2.9 it might have been happening anyway and we were
> perhaps getting unreported corruptions (lost writes); so this
> really needs fixing.
>
> This flow came from discussions between Kevin and me, and we can't
> see a way of fixing it without exposing a new state to the management
> layer.
>
> The flow is now:
>
> (qemu) migrate_set_capability pause-before-device on
How about 'switchover-cleanup'
> (qemu) migrate -d ...
> (qemu) info migrate
> ...
> Migration status: pause-before-device
and 'switchover'
> ...
> << issue commands to clean up any block jobs>>
>
> (qemu) migrate_continue pause-before-device
> (qemu) info migrate
> ...
> Migration status: completed
>
> This set has been _very_ lightly tested just at the normal migration
> code, without the addition of the drive mirror; so this is a first
> cut. I'd appreciate some feedback from libvirt whether the inteface
> is OK and ideally a hack to test it in a full libvirt setup to see
> if we hit any other issues.
>
> The precopy flow is:
> active->pause-before-device->completed
>
> The postcopy flow is:
> active->pause-before-device->postcopy-active->completed
>
> Although the behaviour with postcopy only gets interesting when
> we add something like Max's active-sync.
>
> Please argue about the command and state naming.
Argued above :-)
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
- [Qemu-devel] [PATCH 4/7] migration: migrate-continue, (continued)
- [Qemu-devel] [PATCH 4/7] migration: migrate-continue, Dr. David Alan Gilbert (git), 2017/10/11
- [Qemu-devel] [PATCH 5/7] migrate: HMP migate_continue, Dr. David Alan Gilbert (git), 2017/10/11
- [Qemu-devel] [PATCH 6/7] migration: allow cancel to unpause, Dr. David Alan Gilbert (git), 2017/10/11
- [Qemu-devel] [PATCH 7/7] migration: pause-before-device for postcopy, Dr. David Alan Gilbert (git), 2017/10/11
- Re: [Qemu-devel] [PATCH 0/7] migration: pause-before-device, no-reply, 2017/10/11
- Re: [Qemu-devel] [PATCH 0/7] migration: pause-before-device, Daniel P. Berrange, 2017/10/12
Re: [Qemu-devel] [PATCH 0/7] migration: pause-before-device,
Daniel P. Berrange <=