qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] dirty page count problem


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] dirty page count problem
Date: Fri, 21 Jul 2017 20:07:56 +0100
User-agent: Mutt/1.8.3 (2017-05-23)

* Dr. David Alan Gilbert (address@hidden) wrote:
> Hi,
>   Git bisect is pointing to your patch 084140bd49:
>   exec: fix access to ram_list.dirty_memory when sync dirty bitmap
> 
> trying to diagnose a bug I'm seeing; it looks like the dirty page count
> is wrong for some reason.
> 
> Alex Bennée spotted a problem where the postcopy test would occasionally
> fail under very heavy load;    attaching a debugger and it looks like
> the problem is we have a migration_dirty_page count stuck at 2;
> in the normal migration tests we don't spot this, because 2 pages is
> smaller than the threshold to end migration and so an extra 2 pages
> doesn't block it finishing.   However, with a very
> small downtime setting (like we use in the postcopy test) and with
> very low bandwidth (as when Alex ran the test on a very heavily loaded
> machine) we end up never calling the bitmap sync again and never
> completing the iteration.
> 
> I'm using the following addition to spot the problem:

I think I have the fix for this, or at least the reason; the alignment
check also needs fixing up, but the 08414 patch missed it.

  Alex: Please test the fix included at the bottom

> 
> diff --git a/migration/ram.c b/migration/ram.c
> index e75f1050e4..3ddf884952 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1350,6 +1350,13 @@ static int ram_find_and_save_block(RAMState *rs, bool 
> last_stage)
>          }
>      } while (!pages && again);
> 
> +    if (!pages && !again && pss.complete_round && rs->migration_dirty_pages)
> +    {
> +        /* Should make this fail migration ? */
> +        fprintf(stderr, "%s: no page found, yet dirty_pages=%"PRIu64"\n",
> +                __func__, rs->migration_dirty_pages);
> +    }
> +
>      rs->last_seen_block = pss.block;
>      rs->last_page = pss.page;
> 
> (which I might add as a test to fail a migration)
> 
> That test fails easily even on an unloaded machine:
> tests/postcopy-test
> /x86_64/postcopy: ram_find_and_save_block: no page found, yet dirty_pages=2
> ram_find_and_save_block: no page found, yet dirty_pages=2
> ram_find_and_save_block: no page found, yet dirty_pages=2
> OK
> 
> 
> I'll try and debug where our extra two pages are coming from.

From b2c86818eaaaf790246f21ae21fead903c1d64f0 Mon Sep 17 00:00:00 2001
From: "Dr. David Alan Gilbert" <address@hidden>
Date: Fri, 21 Jul 2017 19:59:01 +0100
Subject: [PATCH] cpu_physical_memory_sync_dirty_bitmap: Fix alignment check

This code has an optimised, word aligned version, and a boring
unaligned version.  Recently 084140bd498909 fixed a missing offset
addition from the core of both versions.  However, the offset isn't
necessarily aligned and thus the choice between the two versions
needs fixing up to also include the offset.

DANGER: Friday evening, post-dinner lightly tested patch.

Symptom:
  A few stuck unsent pages during migration; not normally noticed
unless under very low bandwidth.

Fixes: 084140bd498909
---
 include/exec/ram_addr.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index c04f4f6..c802f7f 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -378,15 +378,16 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock 
*rb,
 {
     ram_addr_t addr;
     unsigned long page = BIT_WORD(start >> TARGET_PAGE_BITS);
+    unsigned long word = BIT_WORD((start + rb->offset) >> TARGET_PAGE_BITS);
     uint64_t num_dirty = 0;
     unsigned long *dest = rb->bmap;
 
     /* start address is aligned at the start of a word? */
-    if (((page * BITS_PER_LONG) << TARGET_PAGE_BITS) == start) {
+    if (((word * BITS_PER_LONG) << TARGET_PAGE_BITS) ==
+         (start + rb->offset)) {
         int k;
         int nr = BITS_TO_LONGS(length >> TARGET_PAGE_BITS);
         unsigned long * const *src;
-        unsigned long word = BIT_WORD((start + rb->offset) >> 
TARGET_PAGE_BITS);
         unsigned long idx = (word * BITS_PER_LONG) / DIRTY_MEMORY_BLOCK_SIZE;
         unsigned long offset = BIT_WORD((word * BITS_PER_LONG) %
                                         DIRTY_MEMORY_BLOCK_SIZE);
-- 
1.8.3.1

> Dave
> --
> Dr. David Alan Gilbert / address@hidden / Manchester, UK
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]