|
From: | Quan Xu |
Subject: | Re: [Qemu-devel] [PATCH RFC] migration: make sure to run iterate precopy during the bulk stage |
Date: | Tue, 4 Sep 2018 20:48:51 +0800 |
User-agent: | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.0 |
on 2018/9/4 17:12, Juan Quintela wrote:
Quan Xu <address@hidden> wrote:From 8dbf7370e7ea1caab0b769d0d4dcdd072d14d421 Mon Sep 17 00:00:00 2001 From: Quan Xu <address@hidden> Date: Wed, 29 Aug 2018 21:33:14 +0800 Subject: [PATCH RFC] migration: make sure to run iterate precopy during the bulk stage Since the bulk stage assumes in (migration_bitmap_find_dirty) that every page is dirty, return a rough total ram as pending size to make sure that migration thread continues to run iterate precopy during the bulk stage. Otherwise the downtime grows unpredictably, as migration thread needs to send both the rest of pages and dirty pages during complete precopy. Signed-off-by: Quan Xu <address@hidden> --- migration/ram.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/migration/ram.c b/migration/ram.c index 79c8942..cfa304c 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -3308,7 +3308,8 @@ static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size, /* We can do postcopy, and all the data is postcopiable */ *res_compatible += remaining_size; } else { - *res_precopy_only += remaining_size; + *res_precopy_only += (rs->ram_bulk_stage ? + ram_bytes_total() : remaining_size); } }Hi I don't oppose the change. But what I don't understand is _why_ it is needed (or to say itotherwise, how it worked until now).
I run migration in a slow network throughput (about ~500mbps). in my opion, as the slow network throughput, there is more 'break' during iterate precopy (as the MAX_WAIT). as said in patch description, even to send both the rest pages and dirty pages, if in a higher network throughput,
the downtime would look still within an acceptable range.
I was wondering about the opposit direction, and just initialize the number of dirty pages at the beggining of the loop and then let decrease it for each processed page.
I understand your concern. I also wanted to fix as your suggestion. however, to me, this would be an overhead to maintain another count during migration.
Quan
I don't remember either how big was the speedud of not walking the bitmap on the 1st stage to start with. Later, Juan.
[Prev in Thread] | Current Thread | [Next in Thread] |