qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's mem


From: zhanghailiang
Subject: Re: [Qemu-devel] [Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration
Date: Thu, 2 Apr 2015 19:52:25 +0800
User-agent: Mozilla/5.0 (Windows NT 6.1; rv:31.0) Gecko/20100101 Thunderbird/31.1.1

On 2015/4/1 3:06, Dr. David Alan Gilbert wrote:
* zhanghailiang (address@hidden) wrote:
On 2015/3/30 15:59, Dr. David Alan Gilbert wrote:
* zhanghailiang (address@hidden) wrote:
On 2015/3/27 18:18, Dr. David Alan Gilbert wrote:
* zhanghailiang (address@hidden) wrote:
On 2015/3/26 11:52, Li Zhijian wrote:
On 03/26/2015 11:12 AM, Wen Congyang wrote:
On 03/25/2015 05:50 PM, Juan Quintela wrote:
zhanghailiang<address@hidden>  wrote:
Hi all,

We found that, sometimes, the content of VM's memory is inconsistent between 
Source side and Destination side
when we check it just after finishing migration but before VM continue to Run.

We use a patch like bellow to find this issue, you can find it from affix,
and Steps to reprduce:

(1) Compile QEMU:
  ./configure --target-list=x86_64-softmmu  --extra-ldflags="-lssl" && make

(2) Command and output:
SRC: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock 
-netdev tap,id=hn0-device virtio-net-pci,id=net-pci0,netdev=hn0 -boot c -drive 
file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
 -device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc 
:7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio
Could you try to reproduce:
- without vhost
- without virtio-net
- cache=unsafe is going to give you trouble, but trouble should only
   happen after migration of pages have finished.
If I use ide disk, it doesn't happen.
Even if I use virtio-net with vhost=on, it still doesn't happen. I guess
it is because I migrate the guest when it is booting. The virtio net
device is not used in this case.
Er??????
it reproduces in my ide disk
there is no any virtio device, my command line like below

x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock -net none
-boot c -drive file=/home/lizj/ubuntu.raw -vnc :7 -m 2048 -smp 2 -machine
usb=off -no-user-config -nodefaults -monitor stdio -vga std

it seems easily to reproduce this issue by following steps in _ubuntu_ guest
1.  in source side, choose memtest in grub
2. do live migration
3. exit memtest(type Esc in when memory testing)
4. wait migration complete


Yes???it is a thorny problem. It is indeed easy to reproduce, just as
your steps in the above.

This is my test result: (I also test accel=tcg, it can be reproduced also.)
Source side:
# x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.3,accel=kvm,usb=off 
-no-user-config -nodefaults  -cpu qemu64,-kvmclock -boot c -drive 
file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device 
cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio
(qemu) ACPI_BUILD: init ACPI tables
ACPI_BUILD: init ACPI tables
migrate tcp:9.61.1.8:3004
ACPI_BUILD: init ACPI tables
before cpu_synchronize_all_states
5a8f72d66732cac80d6a0d5713654c0e
md_host : before saving ram complete
5a8f72d66732cac80d6a0d5713654c0e
md_host : after saving ram complete
5a8f72d66732cac80d6a0d5713654c0e
(qemu)

Destination side:
# x86_64-softmmu/qemu-system-x86_64 -machine pc-i440fx-2.3,accel=kvm,usb=off 
-no-user-config -nodefaults  -cpu qemu64,-kvmclock -boot c -drive 
file=/mnt/sdb/pure_IMG/ubuntu/ubuntu_14.04_server_64_2U_raw -device 
cirrus-vga,id=video0,vgamem_mb=8 -vnc :7 -m 2048 -smp 2 -monitor stdio 
-incoming tcp:0:3004
(qemu) QEMU_VM_SECTION_END, after loading ram
d7cb0d8a4bdd1557fb0e78baee50c986
md_host : after loading all vmstate
d7cb0d8a4bdd1557fb0e78baee50c986
md_host : after cpu_synchronize_all_post_init
d7cb0d8a4bdd1557fb0e78baee50c986

Hmm, that's not good.  I suggest you md5 each of the RAMBlock's individually;
to see if it's main RAM that's different or something more subtle like
video RAM.


Er, all my previous tests are md5 'pc.ram' block only.

But then maybe it's easier just to dump the whole of RAM to file
and byte compare it (hexdump the two dumps and diff ?)

Hmm, we also used memcmp function to compare every page, but the addresses
seem to be random.

Besides, in our previous test, we found it seems to be more easy to reproduce
when migration occurs during VM's start-up or reboot process.

Is there any possible that some devices have special treatment when VM start-up
which may miss setting dirty-bitmap ?

I don't think there should be, but the code paths used during startup are
probably much less tested with migration.  I'm sure the startup code
uses different part of device emulation.   I do know we have some bugs

Er, Maybe there is a special case:

During VM's start-up, i found that the KVMslot changed many times, it was a 
process of
smashing total memory space into small slot.

If some pages was dirtied and its dirty-bitmap has been set in KVM module,
but we didn't sync the bitmaps to QEMU user-space before this slot was smashed,
with its previous bitmap been destroyed.
The bitmap of dirty pages in the previous KVMslot maybe be missed.

What's your opinion? Can this situation i described in the above happen?

The bellow log was grabbed, when i tried to figure out a quite same question 
(some pages miss dirty-bitmap setting) we found in COLO:
Occasionally, there will be an error report in SLAVE side:

     qemu: warning: error while loading state for instance 0x0 of device
     'kvm-tpr-opt'                                                 '
     qemu-system-x86_64: loadvm failed

We found that it related to three address (gpa: 0xca000,0xcb000,0xcc000, which 
are the address of 'kvmvapic.rom ?'), and sometimes
their corresponding dirty-map will be missed in Master side, because their 
KVMSlot is destroyed before we sync its dirty-bitmap to qemu.

(I'm still not quite sure if this can also happen in common migration, i will 
try to test it in normal migration)

Hi,

We have found two bugs (places) that miss setting migration-bitmap of dirty 
pages,
The virtio-blk related can be fixed by patch of Wen Congyang, you can find his 
reply in the list.
And the 'kvm-tpr-opt' related can be fixed by the follow patch.

Thanks,
zhang

>From 0c63687d0f14f928d6eb4903022a7981db6ba59f Mon Sep 17 00:00:00 2001
From: zhanghailiang <address@hidden>
Date: Thu, 2 Apr 2015 19:26:31 +0000
Subject: [PATCH] kvm-all: Sync dirty-bitmap from kvm before kvm destroy the
 corresponding dirty_bitmap

Sometimes, we destroy the dirty_bitmap in kvm_memory_slot before any sync action
occur, this bit in dirty_bitmap will be missed, and which will lead the 
corresponding
dirty pages to be missed in migration.

This usually happens when do migration during VM's Start-up or Reboot.

Signed-off-by: zhanghailiang <address@hidden>
---
 exec.c    | 2 +-
 kvm-all.c | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/exec.c b/exec.c
index 874ecfc..4b1b39b 100644
--- a/exec.c
+++ b/exec.c
@@ -59,7 +59,7 @@
 //#define DEBUG_SUBPAGE

 #if !defined(CONFIG_USER_ONLY)
-static bool in_migration;
+bool in_migration;

 /* ram_list is read under rcu_read_lock()/rcu_read_unlock().  Writes
  * are protected by the ramlist lock.
diff --git a/kvm-all.c b/kvm-all.c
index 335438a..dd75eff 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -128,6 +128,8 @@ bool kvm_allowed;
 bool kvm_readonly_mem_allowed;
 bool kvm_vm_attributes_allowed;

+extern bool in_migration;
+
 static const KVMCapabilityInfo kvm_required_capabilites[] = {
     KVM_CAP_INFO(USER_MEMORY),
     KVM_CAP_INFO(DESTROY_MEMORY_REGION_WORKS),
@@ -715,7 +717,7 @@ static void kvm_set_phys_mem(MemoryRegionSection *section, 
bool add)

         old = *mem;

-        if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
+        if (mem->flags & KVM_MEM_LOG_DIRTY_PAGES || in_migration) {
             kvm_physical_sync_dirty_bitmap(section);
         }

--

Ah, kvm-tpr-opt.
I also hit this today with colo; it didn't happen on a new machine, but it 
happens on an older machine, or in
this case a nested setup.

-global kvm-apic.vapic=false

Turns the vapic off, and seems to have got me further.

I don't understand the details of kvm-tpr-opt; but it's a hack for speeding up 
access
to the 'tpr' (Task priority register) which apparently Windows does a lot and 
used
to slow things down;  the hack does involve modifying some stuff on the fly, 
and it
doesn't happen if the host has some new features (tpr shadow I think also look 
for flexpriority).

Dave

Thanks,
zhanghailiang

/var/log/message:
Mar 31 18:32:53 master kernel: [15908.524721] memslot dirty bitmap of '4' was 
destroyed, base_gfn 0x100, npages 524032
Mar 31 18:32:53 master kernel: [15908.657853] +++ gfn=0xcc is marked dirty, 
id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.665105] +++ gfn=0xcb is marked dirty, 
id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.672360] +++ gfn=0xca is marked dirty, 
id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.682058] memslot dirty bitmap of '4' was 
destroyed, base_gfn 0xfebc0, npages 16
Mar 31 18:32:53 master kernel: [15908.849527] +++ gfn=0xca is marked dirty, 
id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.856845] +++ gfn=0xcb is marked dirty, 
id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.864161] +++ gfn=0xcc is marked dirty, 
id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.872676] memslot dirty bitmap of '4' was 
destroyed, base_gfn 0xfeb80, npages 64
Mar 31 18:32:53 master kernel: [15908.882694] +++ gfn=0xca is marked dirty, 
id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.889948] +++ gfn=0xcb is marked dirty, 
id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.897202] +++ gfn=0xcc is marked dirty, 
id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.911652] +++ gfn=0xca is marked dirty, 
id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.919543] +++ gfn=0xca is marked dirty, 
id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.927419] +++ gfn=0xcb is marked dirty, 
id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.935317] +++ gfn=0xcb is marked dirty, 
id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.943209] +++ gfn=0xcb is marked dirty, 
id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.951083] +++ gfn=0xcb is marked dirty, 
id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.958971] +++ gfn=0xcc is marked dirty, 
id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.966837] +++ gfn=0xcc is marked dirty, 
id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.974707] +++ gfn=0xcc is marked dirty, 
id=2, base_gfn=0xc0, npages=524096
Mar 31 18:32:53 master kernel: [15908.988470] memslot dirty bitmap of '2' was 
destroyed, base_gfn 0xc0, npages 524096
Mar 31 18:32:53 master kernel: [15909.002403] kvm: zapping shadow pages for 
mmio generation wraparound
Mar 31 18:32:53 master kernel: [15909.002523] memslot dirty bitmap of '2' was 
destroyed, base_gfn 0xc0, npages 8
Mar 31 18:32:53 master kernel: [15909.010110] memslot dirty bitmap of '4' was 
destroyed, base_gfn 0xc8, npages 524088
Mar 31 18:32:53 master kernel: [15909.023988] memslot dirty bitmap of '5' was 
destroyed, base_gfn 0xcd, npages 3
Mar 31 18:32:53 master kernel: [15909.031594] memslot dirty bitmap of '6' was 
destroyed, base_gfn 0xd0, npages 524080
Mar 31 18:32:53 master kernel: [15909.044708] memslot dirty bitmap of '5' was 
destroyed, base_gfn 0xcd, npages 11
Mar 31 18:32:53 master kernel: [15909.052392] memslot dirty bitmap of '6' was 
destroyed, base_gfn 0xd8, npages 524072
Mar 31 18:32:53 master kernel: [15909.065651] memslot dirty bitmap of '5' was 
destroyed, base_gfn 0xcd, npages 19
Mar 31 18:32:53 master kernel: [15909.073329] memslot dirty bitmap of '6' was 
destroyed, base_gfn 0xe0, npages 524064
Mar 31 18:32:53 master kernel: [15909.086379] memslot dirty bitmap of '5' was 
destroyed, base_gfn 0xcd, npages 27
Mar 31 18:32:53 master kernel: [15909.094084] memslot dirty bitmap of '6' was 
destroyed, base_gfn 0xe8, npages 524056
Mar 31 18:32:53 master kernel: [15909.107354] memslot dirty bitmap of '6' was 
destroyed, base_gfn 0xec, npages 524052
Mar 31 18:32:54 master dhcpcd[5408]: eth0: timed out
Mar 31 18:32:54 master dhcpcd[5408]: eth0: trying to use old lease in 
`/var/lib/dhcpcd/dhcpcd-eth0.info'
Mar 31 18:32:54 master dhcpcd[5408]: eth0: lease expired 30265 seconds ago
Mar 31 18:32:54 master dhcpcd[5408]: eth0: broadcasting for a lease
Mar 31 18:33:02 master qemu-system-x86_64: ====qemu: The 0 times do 
checkpoing===
Mar 31 18:33:02 master qemu-system-x86_64: == migration_bitmap_sync ==
Mar 31 18:33:02 master qemu-system-x86_64: qemu:  --- addr=0xca000 
(hva=0x7f4fa32ca000), who's bitmap not set ---
Mar 31 18:33:02 master qemu-system-x86_64: qemu:  --- addr=0xcb000 
(hva=0x7f4fa32cb000), who's bitmap not set ---
Mar 31 18:33:02 master qemu-system-x86_64: qemu:  --- addr=0xcc000 
(hva=0x7f4fa32cc000), who's bitmap not set ---
Mar 31 18:33:05 master kernel: [15921.057246] device eth2 left promiscuous mode
Mar 31 18:33:07 master avahi-daemon[5773]: Withdrawing address record for 
fe80::90be:cfff:fe9e:f03e on tap0.
Mar 31 18:33:07 master kernel: [15922.513313] br0: port 2(tap0) entered 
disabled state
Mar 31 18:33:07 master kernel: [15922.513480] device tap0 left promiscuous mode
Mar 31 18:33:07 master kernel: [15922.513508] br0: port 2(tap0) entered 
disabled state
Mar 31 18:33:07 master kernel: [15922.562591] memslot dirty bitmap of '8' was 
destroyed, base_gfn 0x100, npages 524032
Mar 31 18:33:07 master kernel: [15922.570948] memslot dirty bitmap of '3' was 
destroyed, base_gfn 0xfd000, npages 4096
Mar 31 18:33:07 master kernel: [15922.578967] memslot dirty bitmap of '0' was 
destroyed, base_gfn 0x0, npages 160
Mar 31 18:33:07 master kernel: [15922.586556] memslot dirty bitmap of '1' was 
destroyed, base_gfn 0xfffc0, npages 64
Mar 31 18:33:07 master kernel: [15922.586574] memslot dirty bitmap of '5' was 
destroyed, base_gfn 0xcd, npages 31
Mar 31 18:33:07 master kernel: [15922.586575] memslot dirty bitmap of '7' was 
destroyed, base_gfn 0xf0, npages 16
Mar 31 18:33:07 master kernel: [15922.586577] memslot dirty bitmap of '2' was 
destroyed, base_gfn 0xc0, npages 10
Mar 31 18:33:07 master kernel: [15922.586578] memslot dirty bitmap of '6' was 
destroyed, base_gfn 0xec, npages 4
Mar 31 18:33:07 master kernel: [15922.586579] memslot dirty bitmap of '4' was 
destroyed, base_gfn 0xca, npages 3

PS:
QEMU:
static void migration_bitmap_sync(void)
{
     trace_migration_bitmap_sync_end(migration_dirty_pages
                                     - num_dirty_pages_init);
     num_dirty_pages_period += migration_dirty_pages - num_dirty_pages_init;
     end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+    syslog(LOG_INFO, "== migration_bitmap_sync ==");
+    check_bitmap(();

+void check_bitmap(void)
+{
+    int i;
+    char *host;
+    ram_addr_t addr[3] = { 0xca000, 0xcb000, 0xcc000 };
+    RAMBlock *block = NULL;
+    int ret;
+
+    block = QLIST_FIRST_RCU(&ram_list.blocks);
+    for (i = 0; i < 3; i++) {
+        unsigned long nr = block->mr->ram_addr + (addr[i] >> TARGET_PAGE_BITS);
+        host =  memory_region_get_ram_ptr(block->mr) + addr[i];
+        ret = test_bit(nr, migration_bitmap);
+        if (ret == 0) {
+            syslog(LOG_INFO, "qemu:  --- addr=0x%llx (hva=%p), who's bitmap not set 
---\n",
+                       addr[i], host);
+        } else {
+            syslog(LOG_INFO, "qemu: +++ OK, addr=0x%llx (hva=%p) , who's bitap is 
set +++\n",
+                       addr[i], host);;
+        }
+    }
+}

KVM???
static void mark_page_dirty_in_slot(struct kvm *kvm,
                     struct kvm_memory_slot *memslot,
                     gfn_t gfn)
{
+    if ((gfn == 0xca || gfn == 0xcb || gfn == 0xcc) && !memslot->dirty_bitmap) 
{
+        printk("oops, dirty_bitmap is null, gfn=0x%llx\n",gfn);
+    }
     if (memslot && memslot->dirty_bitmap) {
         unsigned long rel_gfn = gfn - memslot->base_gfn;
+        if (gfn == 0xca || gfn == 0xcb || gfn == 0xcc) {
+            printk("+++ gfn=0x%llx is marked dirty, id=%d, base_gfn=0x%llx, 
npages=%d\n",
+                   gfn, memslot->id, memslot->base_gfn, memslot->npages);
+        }
         set_bit_le(rel_gfn, memslot->dirty_bitmap);
     }
}

static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
{
     if (!memslot->dirty_bitmap)
         return;
+    if (memslot->id != 9) {
+        printk("memslot dirty bitmap of '%d' was destroyed, base_gfn 0x%llx, npages 
%lld\n",
+           memslot->id, memslot->base_gfn, memslot->npages);
+    }
     kvm_kvfree(memslot->dirty_bitmap);
     memslot->dirty_bitmap = NULL;
}


filed against migration during windows boot, I'd not considered that it might
be devices not updating the bitmap.

Dave



Thanks,
zhanghailiang



What kind of load were you having when reproducing this issue?
Just to confirm, you have been able to reproduce this without COLO
patches, right?

(qemu) migrate tcp:192.168.3.8:3004
before saving ram complete
ff703f6889ab8701e4e040872d079a28
md_host : after saving ram complete
ff703f6889ab8701e4e040872d079a28

DST: # x86_64-softmmu/qemu-system-x86_64 -enable-kvm -cpu qemu64,-kvmclock 
-netdev tap,id=hn0,vhost=on -device virtio-net-pci,id=net-pci0,netdev=hn0 -boot 
c -drive 
file=/mnt/sdb/pure_IMG/sles/sles11_sp3.img,if=none,id=drive-virtio-disk0,cache=unsafe
 -device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -vnc 
:7 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio 
-incoming tcp:0:3004
(qemu) QEMU_VM_SECTION_END, after loading ram
230e1e68ece9cd4e769630e1bcb5ddfb
md_host : after loading all vmstate
230e1e68ece9cd4e769630e1bcb5ddfb
md_host : after cpu_synchronize_all_post_init
230e1e68ece9cd4e769630e1bcb5ddfb

This happens occasionally, and it is more easy to reproduce when issue 
migration command during VM's startup time.
OK, a couple of things.  Memory don't have to be exactly identical.
Virtio devices in particular do funny things on "post-load".  There
aren't warantees for that as far as I know, we should end with an
equivalent device state in memory.

We have done further test and found that some pages has been dirtied but its 
corresponding migration_bitmap is not set.
We can't figure out which modules of QEMU has missed setting bitmap when dirty 
page of VM,
it is very difficult for us to trace all the actions of dirtying VM's pages.
This seems to point to a bug in one of the devices.

Actually, the first time we found this problem was in the COLO FT development, 
and it triggered some strange issues in
VM which all pointed to the issue of inconsistent of VM's memory. (We have try 
to save all memory of VM to slave side every time
when do checkpoint in COLO FT, and everything will be OK.)

Is it OK for some pages that not transferred to destination when do migration ? 
Or is it a bug?
Pages transferred should be the same, after device state transmission is
when things could change.

This issue has blocked our COLO development... :(

Any help will be greatly appreciated!
Later, Juan.

.





--
Dr. David Alan Gilbert / address@hidden / Manchester, UK

.



--
Dr. David Alan Gilbert / address@hidden / Manchester, UK

.



--
Dr. David Alan Gilbert / address@hidden / Manchester, UK

.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]