[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit t
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes |
Date: |
Thu, 12 Jul 2018 09:50:32 +0100 |
User-agent: |
Mutt/1.10.0 (2018-05-17) |
* Balamuruhan S (address@hidden) wrote:
> On Fri, Jul 06, 2018 at 11:56:59AM +0100, Dr. David Alan Gilbert wrote:
> > * Dr. David Alan Gilbert (address@hidden) wrote:
> > > * Peter Xu (address@hidden) wrote:
> > > > Based-on: <address@hidden>
> > > >
> > > > Based on the series to unbreak postcopy:
> > > > Subject: [PATCH v3 0/4] migation: unbreak postcopy recovery
> > > > Message-Id: <address@hidden>
> > > >
> > > > This series introduce a new postcopy recovery test. The new test
> > > > actually helped me to identify two bugs there so fix them as well
> > > > before 3.0 release.
> > > >
> > > > Patch 1: a trivial cleanup for existing postcopy ram load, which I
> > > > found a bit confusing during debugging the problem.
> > > >
> > > > Patch 2-3: two bug fixes that address different issues. Please see
> > > > the commit log for more information.
> > > >
> > > > Patch 4-9: add the postcopy recovery unit test.
> > > >
> > > > Please review. Thanks,
> > >
> > > Queued
> >
> > Hi Peter,
> > There's a problem in there somewhere; I'm getting
> > an intermittent failure of the test if I run a make check -j 8 on my
> > laptop. Just running two copies of tests/migration-test in parallel
> > sometimes triggers it (but not if I turn on QTEST_LOG!).
> > But it's always failing with:
> >
> >
> > ERROR:/home/dgilbert/git/migpull/tests/migration-test.c:373:migrate_recover:
> > assertion failed: (qdict_haskey(rsp, "return"))
> >
> > Dave
>
> Hi Peter, Dave,
Hi Bala,
> I have applied this patchset in upstream Qemu to test postcopy
> pause/recovery.
Are you still seeing this with the set that got merged into 3.0-rc0?
The second of your errors looks similar to problems with the race
we had before Peter fixed it; but the set that I merged passed a 'make
check' on a Power box.
Dave
> I observed error after triggering recovery command from source monitor
> where the target is lost and the source remains to be in `postcopy-pause`
> state.
>
> Please find my observation below,
>
> Source:
>
> # ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -vga none -machine
> \
> pseries -m 64G,slots=128,maxmem=128G -smp 16,maxcpus=32 -device
> virtio-blk-pci,drive=rootdisk \
> -drive
> file=/home/hostos-ppc64le.qcow2,if=none,cache=none,format=qcow2,id=rootdisk \
> -monitor telnet:127.0.0.1:1234,server,nowait -net nic,model=virtio -net user \
> -redir tcp:2000::22
>
> qemu-system-ppc64: Detected IO failure for postcopy. Migration paused.
>
> Source Monitor:
>
> (qemu) migrate_set_capability postcopy-ram on
> (qemu) migrate_set_parameter max-postcopy-bandwidth 4096
> (qemu) migrate -d tcp:127.0.0.1:4444
> (qemu) migrate_start_postcopy
> (qemu) migrate_pause
> (qemu) migrate -r tcp:127.0.0.1:4446
>
> After triggering recovery, target is lost with the error mentioned below
> and source remains to be in `postcopy-paused` state
>
> (qemu) info migrate
> globals:
> store-global-state: on
> only-migratable: off
> send-configuration: on
> send-section-footer: on
> decompress-error-check: on
> capabilities: xbzrle: off rdma-pin-all: off auto-converge: off
> zero-blocks: off \
> compress: off events: off postcopy-ram: on x-colo: off release-ram: off
> block: off return-path: off pause-before-switchover: off x-multifd: off \
> dirty-bitmaps: off
> postcopy-blocktime: off late-block-activate: off
> Migration status: postcopy-recover
> total time: 78818 milliseconds
> expected downtime: 300 milliseconds
> setup: 169 milliseconds
> transferred ram: 177749 kbytes
> throughput: 63.72 mbps
> remaining ram: 28061376 kbytes
> total ram: 67109120 kbytes
> duplicate: 9742102 pages
> skipped: 0 pages
> normal: 22986 pages
> normal bytes: 91944 kbytes
> dirty sync count: 2
> page size: 4 kbytes
> multifd bytes: 0 kbytes
> dirty pages rate: 1273187 pages
> postcopy request count: 236
>
>
> Target:
>
> # ppc64-softmmu/qemu-system-ppc64 --enable-kvm --nographic -vga none -machine
> \
> pseries -m 64G,slots=128,maxmem=128G -smp 16,maxcpus=32 -device
> virtio-blk-pci,drive=rootdisk \
> -drive
> file=/home/bala/sharing/hostos-ppc64le.qcow2,if=none,cache=none,format=qcow2,id=rootdisk
> \
> -monitor telnet:127.0.0.1:1235,server,nowait -net nic,model=virtio -net user \
> -redir tcp:2001::22 -incoming tcp:127.0.0.1:4444
>
>
> qemu-system-ppc64: check_section_footer: Read section footer failed: -5
> qemu-system-ppc64: Detected IO failure for postcopy. Migration paused.
> qemu-system-ppc64: Not a migration stream
> qemu-system-ppc64: load of migration failed: Invalid argument
>
>
> Target Monitor:
>
> (qemu) migrate_set_capability postcopy-ram on
> (qemu) migrate_recover tcp:127.0.0.1:4446
> (qemu) Connection closed by foreign host.
>
> QTest:
>
> Also with respect to Qtest, I have tested it and the recovery test
> doesn't complete as it waits on the source for "completed" but due to this
> issue source remains to be in `postcopy-paused`
>
> `migrate_postcopy_complete(from, to);`
>
> but it actually doesn't end.
>
> As it did not complete, I cancelled it forcefully
>
> # time QTEST_QEMU_BINARY=./ppc64-softmmu/qemu-system-ppc64
> ./tests/migration-test
> /ppc64/migration/deprecated: OK
> /ppc64/migration/bad_dest: OK
> /ppc64/migration/postcopy/unix: OK
> /ppc64/migration/postcopy/recovery: ^C
>
> real 21m55.176s
> user 2m28.800s
> sys 4m55.980s
>
> -- Bala
> >
> > > > Peter Xu (9):
> > > > migration: simplify check to use qemu file buffer
> > > > migration: loosen recovery check when load vm
> > > > migration: fix incorrect bitmap size calculation
> > > > tests: introduce migrate_postcopy_* helpers
> > > > tests: allow migrate() to take extra flags
> > > > tests: introduce migrate_query*() helpers
> > > > tests: introduce wait_for_migration_status()
> > > > tests: add postcopy recovery test
> > > > tests: hide stderr for postcopy recovery test
> > > >
> > > > migration/ram.c | 21 +++--
> > > > migration/savevm.c | 16 ++--
> > > > tests/migration-test.c | 198 ++++++++++++++++++++++++++++++++---------
> > > > 3 files changed, 176 insertions(+), 59 deletions(-)
> > > >
> > > > --
> > > > 2.17.1
> > > >
> > > >
> > > --
> > > Dr. David Alan Gilbert / address@hidden / Manchester, UK
> > --
> > Dr. David Alan Gilbert / address@hidden / Manchester, UK
> >
>
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK
- Re: [Qemu-devel] [PATCH for-3.0 7/9] tests: introduce wait_for_migration_status(), (continued)
- [Qemu-devel] [PATCH for-3.0 8/9] tests: add postcopy recovery test, Peter Xu, 2018/07/04
- [Qemu-devel] [PATCH for-3.0 9/9] tests: hide stderr for postcopy recovery test, Peter Xu, 2018/07/04
- Re: [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes, Dr. David Alan Gilbert, 2018/07/06
- Re: [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes, Dr. David Alan Gilbert, 2018/07/06
- Re: [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes, Balamuruhan S, 2018/07/06
- Re: [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes, Peter Xu, 2018/07/09
- Re: [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes, Dr. David Alan Gilbert, 2018/07/10
Re: [Qemu-devel] [PATCH for-3.0 0/9] migration: postcopy recovery unit test, bug fixes, Balamuruhan S, 2018/07/09