qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 0/4] migation: unbreak postcopy recovery


From: Peter Xu
Subject: Re: [Qemu-devel] [PATCH v3 0/4] migation: unbreak postcopy recovery
Date: Mon, 2 Jul 2018 16:46:18 +0800
User-agent: Mutt/1.10.0 (2018-05-17)

On Mon, Jul 02, 2018 at 01:34:45PM +0530, Balamuruhan S wrote:
> On Wed, Jun 27, 2018 at 09:22:42PM +0800, Peter Xu wrote:
> > v3:
> > - keep the recovery logic even for RDMA by dropping the 3rd patch and
> >   touch up the original 4th patch (current 3rd patch) to suite that [Dave]
> > 
> > v2:
> > - break the first patch into several
> > - fix a QEMUFile leak
> > 
> > Please review.  Thanks,
> Hi Peter,

Hi, Balamuruhan,

Glad to know that you are playing this stuff with ppc.  I think the
major steps are correct, though...

> 
> I have applied this patchset with upstream Qemu for testing postcopy
> pause recover feature in PowerPC,
> 
> I used NFS shared qcow2 between source and target host
> 
> source:
> # ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -vga none \
> -machine pseries -m 64G,slots=128,maxmem=128G -smp 16,maxcpus=32 \
> -device virtio-blk-pci,drive=rootdisk -drive \
> file=/home/bala/sharing/hostos-ppc64le.qcow2,if=none,cache=none,format=qcow2,id=rootdisk
>  \
> -monitor telnet:127.0.0.1:1234,server,nowait -net nic,model=virtio \
> -net user -redir tcp:2000::22
> 
> To keep the VM with workload I ran stress-ng inside guest,
> 
> # stress-ng --cpu 6 --vm 6 --io 6
> 
> target:
> # ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -vga none \
> -machine pseries -m 64G,slots=128,maxmem=128G -smp 16,maxcpus=32 \
> -device virtio-blk-pci,drive=rootdisk -drive \
> file=/home/bala/sharing/hostos-ppc64le.qcow2,if=none,cache=none,format=qcow2,id=rootdisk
>  \
> -monitor telnet:127.0.0.1:1235,server,nowait -net nic,model=virtio \
> -net user -redir tcp:2001::22 -incoming tcp:0:4445
> 
> enabled postcopy on both source and destination from qemu monitor
> 
> (qemu) migrate_set_capability postcopy-ram on
> 
> From source qemu monitor,
> (qemu) migrate -d tcp:10.45.70.203:4445

[1]

> (qemu) info migrate
> globals:
> store-global-state: on
> only-migratable: off
> send-configuration: on
> send-section-footer: on
> decompress-error-check: on
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
> zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off
> release-ram: off block: off return-path: off pause-before-switchover:
> off x-multifd: off dirty-bitmaps: off postcopy-blocktime: off
> late-block-activate: off 
> Migration status: active
> total time: 2331 milliseconds
> expected downtime: 300 milliseconds
> setup: 65 milliseconds
> transferred ram: 38914 kbytes
> throughput: 273.16 mbps
> remaining ram: 67063784 kbytes
> total ram: 67109120 kbytes
> duplicate: 1627 pages
> skipped: 0 pages
> normal: 9706 pages
> normal bytes: 38824 kbytes
> dirty sync count: 1
> page size: 4 kbytes
> multifd bytes: 0 kbytes
> 
> triggered postcopy from source,
> (qemu) migrate_start_postcopy
> 
> After triggering postcopy from source, in target I tried to pause the
> postcopy migration
> 
> (qemu) migrate_pause
> 
> In target I see error as,
> error while loading state section id 4(ram)
> qemu-system-ppc64: Detected IO failure for postcopy. Migration paused.
> 
> In source I see error as,
> qemu-system-ppc64: Detected IO failure for postcopy. Migration paused.
> 
> Later from target I try for recovery from target monitor,
> (qemu) migrate_recover qemu+ssh://10.45.70.203/system

... here is that URI for libvirt only?

Normally I'll use something similar to [1] above.

> Migrate recovery is triggered already

And this means that you have already sent one recovery command before
hand.  In the future we'd better allow the recovery command to be run
more than once (in case the first one mistyped...).

> 
> but in source still it remains to be in postcopy-paused state
> (qemu) info migrate
> globals:
> store-global-state: on
> only-migratable: off
> send-configuration: on
> send-section-footer: on
> decompress-error-check: on
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
> zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off
> release-ram: off block: off return-path: off pause-before-switchover:
> off x-multifd: off dirty-bitmaps: off postcopy-blocktime: off
> late-block-activate: off 
> Migration status: postcopy-paused
> total time: 222841 milliseconds
> expected downtime: 382991 milliseconds
> setup: 65 milliseconds
> transferred ram: 385270 kbytes
> throughput: 265.06 mbps
> remaining ram: 8150528 kbytes
> total ram: 67109120 kbytes
> duplicate: 14679647 pages
> skipped: 0 pages
> normal: 63937 pages
> normal bytes: 255748 kbytes
> dirty sync count: 2
> page size: 4 kbytes
> multifd bytes: 0 kbytes
> dirty pages rate: 854740 pages
> postcopy request count: 374
> 
> later I also tried to recover postcopy in source monitor,
> (qemu) migrate_recover qemu+ssh://10.45.193.21/system

This command should be run on destination side only.  Here the
"migrate-recover" command on destination will start a new listening
port there waiting for the migration to be continued.  Then after that
command we need an extra command on source to start the recovery:

  (HMP) migrate -r $URI

Here $URI should be the only you specified in the "migrate-recover"
command on destination machine.

> Migrate recover can only be run when postcopy is paused.

I can try to fix up this error.  Basically we shouldn't allow this
command to be run on source machine.

> 
> Looks to be it is broken, please help me if I missed something
> in this test.

Btw, I'm writting up an unit test for postcopy recovery recently, that
could be a good reference for the new feature.  Meanwhile I think I
should write up some documents too afterwards.

Regards,

> 
> Thank you,
> Bala
> > 
> > Peter Xu (4):
> >   migration: delay postcopy paused state
> >   migration: move income process out of multifd
> >   migration: unbreak postcopy recovery
> >   migration: unify incoming processing
> > 
> >  migration/ram.h       |  2 +-
> >  migration/exec.c      |  3 ---
> >  migration/fd.c        |  3 ---
> >  migration/migration.c | 44 ++++++++++++++++++++++++++++++++++++-------
> >  migration/ram.c       | 11 +++++------
> >  migration/savevm.c    |  6 +++---
> >  migration/socket.c    |  5 -----
> >  7 files changed, 46 insertions(+), 28 deletions(-)
> > 
> > -- 
> > 2.17.1
> > 
> > 
> 

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]