qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 0/4] migation: unbreak postcopy recovery


From: Balamuruhan S
Subject: Re: [Qemu-devel] [PATCH v3 0/4] migation: unbreak postcopy recovery
Date: Mon, 2 Jul 2018 15:12:41 +0530
User-agent: Mutt/1.9.2 (2017-12-15)

On Mon, Jul 02, 2018 at 04:46:18PM +0800, Peter Xu wrote:
> On Mon, Jul 02, 2018 at 01:34:45PM +0530, Balamuruhan S wrote:
> > On Wed, Jun 27, 2018 at 09:22:42PM +0800, Peter Xu wrote:
> > > v3:
> > > - keep the recovery logic even for RDMA by dropping the 3rd patch and
> > >   touch up the original 4th patch (current 3rd patch) to suite that [Dave]
> > > 
> > > v2:
> > > - break the first patch into several
> > > - fix a QEMUFile leak
> > > 
> > > Please review.  Thanks,
> > Hi Peter,
> 
> Hi, Balamuruhan,
> 
> Glad to know that you are playing this stuff with ppc.  I think the
> major steps are correct, though...
> 

Thank you Peter for correcting my mistake, It works like a charm.
Nice feature!

Tested-by: Balamuruhan S <address@hidden>

> > 
> > I have applied this patchset with upstream Qemu for testing postcopy
> > pause recover feature in PowerPC,
> > 
> > I used NFS shared qcow2 between source and target host
> > 
> > source:
> > # ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -vga none \
> > -machine pseries -m 64G,slots=128,maxmem=128G -smp 16,maxcpus=32 \
> > -device virtio-blk-pci,drive=rootdisk -drive \
> > file=/home/bala/sharing/hostos-ppc64le.qcow2,if=none,cache=none,format=qcow2,id=rootdisk
> >  \
> > -monitor telnet:127.0.0.1:1234,server,nowait -net nic,model=virtio \
> > -net user -redir tcp:2000::22
> > 
> > To keep the VM with workload I ran stress-ng inside guest,
> > 
> > # stress-ng --cpu 6 --vm 6 --io 6
> > 
> > target:
> > # ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -vga none \
> > -machine pseries -m 64G,slots=128,maxmem=128G -smp 16,maxcpus=32 \
> > -device virtio-blk-pci,drive=rootdisk -drive \
> > file=/home/bala/sharing/hostos-ppc64le.qcow2,if=none,cache=none,format=qcow2,id=rootdisk
> >  \
> > -monitor telnet:127.0.0.1:1235,server,nowait -net nic,model=virtio \
> > -net user -redir tcp:2001::22 -incoming tcp:0:4445
> > 
> > enabled postcopy on both source and destination from qemu monitor
> > 
> > (qemu) migrate_set_capability postcopy-ram on
> > 
> > From source qemu monitor,
> > (qemu) migrate -d tcp:10.45.70.203:4445
> 
> [1]
> 
> > (qemu) info migrate
> > globals:
> > store-global-state: on
> > only-migratable: off
> > send-configuration: on
> > send-section-footer: on
> > decompress-error-check: on
> > capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
> > zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off
> > release-ram: off block: off return-path: off pause-before-switchover:
> > off x-multifd: off dirty-bitmaps: off postcopy-blocktime: off
> > late-block-activate: off 
> > Migration status: active
> > total time: 2331 milliseconds
> > expected downtime: 300 milliseconds
> > setup: 65 milliseconds
> > transferred ram: 38914 kbytes
> > throughput: 273.16 mbps
> > remaining ram: 67063784 kbytes
> > total ram: 67109120 kbytes
> > duplicate: 1627 pages
> > skipped: 0 pages
> > normal: 9706 pages
> > normal bytes: 38824 kbytes
> > dirty sync count: 1
> > page size: 4 kbytes
> > multifd bytes: 0 kbytes
> > 
> > triggered postcopy from source,
> > (qemu) migrate_start_postcopy
> > 
> > After triggering postcopy from source, in target I tried to pause the
> > postcopy migration
> > 
> > (qemu) migrate_pause
> > 
> > In target I see error as,
> > error while loading state section id 4(ram)
> > qemu-system-ppc64: Detected IO failure for postcopy. Migration paused.
> > 
> > In source I see error as,
> > qemu-system-ppc64: Detected IO failure for postcopy. Migration paused.
> > 
> > Later from target I try for recovery from target monitor,
> > (qemu) migrate_recover qemu+ssh://10.45.70.203/system
> 
> ... here is that URI for libvirt only?
> 
> Normally I'll use something similar to [1] above.
> 
> > Migrate recovery is triggered already
> 
> And this means that you have already sent one recovery command before
> hand.  In the future we'd better allow the recovery command to be run
> more than once (in case the first one mistyped...).
> 
> > 
> > but in source still it remains to be in postcopy-paused state
> > (qemu) info migrate
> > globals:
> > store-global-state: on
> > only-migratable: off
> > send-configuration: on
> > send-section-footer: on
> > decompress-error-check: on
> > capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
> > zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off
> > release-ram: off block: off return-path: off pause-before-switchover:
> > off x-multifd: off dirty-bitmaps: off postcopy-blocktime: off
> > late-block-activate: off 
> > Migration status: postcopy-paused
> > total time: 222841 milliseconds
> > expected downtime: 382991 milliseconds
> > setup: 65 milliseconds
> > transferred ram: 385270 kbytes
> > throughput: 265.06 mbps
> > remaining ram: 8150528 kbytes
> > total ram: 67109120 kbytes
> > duplicate: 14679647 pages
> > skipped: 0 pages
> > normal: 63937 pages
> > normal bytes: 255748 kbytes
> > dirty sync count: 2
> > page size: 4 kbytes
> > multifd bytes: 0 kbytes
> > dirty pages rate: 854740 pages
> > postcopy request count: 374
> > 
> > later I also tried to recover postcopy in source monitor,
> > (qemu) migrate_recover qemu+ssh://10.45.193.21/system
> 
> This command should be run on destination side only.  Here the
> "migrate-recover" command on destination will start a new listening
> port there waiting for the migration to be continued.  Then after that
> command we need an extra command on source to start the recovery:
> 
>   (HMP) migrate -r $URI
> 
> Here $URI should be the only you specified in the "migrate-recover"
> command on destination machine.
> 
> > Migrate recover can only be run when postcopy is paused.
> 
> I can try to fix up this error.  Basically we shouldn't allow this
> command to be run on source machine.

Sure, :+1:

> 
> > 
> > Looks to be it is broken, please help me if I missed something
> > in this test.
> 
> Btw, I'm writting up an unit test for postcopy recovery recently, that
> could be a good reference for the new feature.  Meanwhile I think I
> should write up some documents too afterwards.

fine, I am also working on writing test scenario in tp-qemu using Avocado-VT
for postcopy pause/recover and multifd features.

-- Bala
> 
> Regards,
> 
> > 
> > Thank you,
> > Bala
> > > 
> > > Peter Xu (4):
> > >   migration: delay postcopy paused state
> > >   migration: move income process out of multifd
> > >   migration: unbreak postcopy recovery
> > >   migration: unify incoming processing
> > > 
> > >  migration/ram.h       |  2 +-
> > >  migration/exec.c      |  3 ---
> > >  migration/fd.c        |  3 ---
> > >  migration/migration.c | 44 ++++++++++++++++++++++++++++++++++++-------
> > >  migration/ram.c       | 11 +++++------
> > >  migration/savevm.c    |  6 +++---
> > >  migration/socket.c    |  5 -----
> > >  7 files changed, 46 insertions(+), 28 deletions(-)
> > > 
> > > -- 
> > > 2.17.1
> > > 
> > > 
> > 
> 
> -- 
> Peter Xu
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]