[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] migration: keep bytes_xfer_prev init'd to zero
From: |
Peter Xu |
Subject: |
Re: [Qemu-devel] [PATCH] migration: keep bytes_xfer_prev init'd to zero |
Date: |
Thu, 25 May 2017 08:36:09 +0800 |
User-agent: |
Mutt/1.5.24 (2015-08-30) |
On Wed, May 24, 2017 at 01:02:25PM +0000, Felipe Franciosi wrote:
>
> > On 23 May 2017, at 05:27, Peter Xu <address@hidden> wrote:
> >
> > On Fri, May 19, 2017 at 10:59:02PM +0100, Felipe Franciosi wrote:
> >> The first time migration_bitmap_sync() is called, bytes_xfer_prev is set
> >> to ram_state.bytes_transferred which is, at this point, zero. The next
> >> time migration_bitmap_sync() is called, an iteration has happened and
> >> bytes_xfer_prev is set to 'x' bytes. Most likely, more than one second
> >> has passed, so the auto converge logic will be triggered and
> >> bytes_xfer_now will also be set to 'x' bytes.
> >>
> >> This condition is currently masked by dirty_rate_high_cnt, which will
> >> wait for a few iterations before throttling. It would otherwise always
> >> assume zero bytes have been copied and therefore throttle the guest
> >> (possibly) prematurely.
> >>
> >> Given bytes_xfer_prev is only used by the auto convergence logic, it
> >> makes sense to only set its value after a check has been made against
> >> bytes_xfer_now.
> >>
> >> Signed-off-by: Felipe Franciosi <address@hidden>
> >> ~
> >> ---
> >> migration/ram.c | 4 ----
> >> 1 file changed, 4 deletions(-)
> >>
> >> diff --git a/migration/ram.c b/migration/ram.c
> >> index f59fdd4..793af39 100644
> >> --- a/migration/ram.c
> >> +++ b/migration/ram.c
> >> @@ -670,10 +670,6 @@ static void migration_bitmap_sync(RAMState *rs)
> >>
> >> rs->bitmap_sync_count++;
> >>
> >> - if (!rs->bytes_xfer_prev) {
> >> - rs->bytes_xfer_prev = ram_bytes_transferred();
> >> - }
> >> -
> >> if (!rs->time_last_bitmap_sync) {
> >> rs->time_last_bitmap_sync = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >> }
> >> --
> >> 1.9.5
> >>
> >>
> >
> > I feel like this patch wants to correctly initialize bytes_xfer_prev,
> > however I still see problem. E.g., when user specify auto-convergence
> > during migration, and in the first iteration we'll always have a very
> > small bytes_xfer_prev (with this patch, it'll be zero) with a very big
> > bytes_xfer_now (which is the ram_bytes_transferred() value).
>
> Interesting point. Worth noting, that's no different than what happens today
> anyway (bytes_xfer_prev would be initialised to the first non-zero
> bytes_transferred). I therefore don't think it should stop or slow this
> patch's acceptance.
>
> > If so, do
> > you think squash below change together with current one would be more
> > accurate?
>
> As a matter of fact I had a different idea (below) to fix what you are
> describing. I was still experimenting with this code, so haven't sent a patch
> yet. But I'm going to send a series soon for comments. Basically, I think
> some other changes are required to make sure these numbers are correct:
I agree on most points.
>
> 1) dirty_rate_high_cnt++ >= 2 should be ++dirty_rate_high_cnt >= 2
> - The original commit msg from 070afca25 (Jason J. Herne) says that
> convergence should be triggered after two passes. Current code does it after
> three passes (and four passes the first time around; see number 2 below).
> - I personally feel this counter should go away altogether. If the migration
> is not converging and the VM is going to be throttled, there's no point
> stressing the network any further; just start throttling straight away.
>
> 2) dirty_pages_rate should be updated before the autoconverge logic.
> - Right now, we delay throttling by a further iteration, as dirty_pages_rate
> is set after the first pass through the autoconverge logic (it is zero the
> first time around).
> - The "if (rs->dirty_pages_rate &&..." part of the conditional can then be
> removed, as it won't ever be zero.
For this one: why dirty_pages_rate cannot be zero? But I agree with
you that it can be removed since even if it's zero, then the next
check would fail as well (rs->num_dirty_pages_period *
TARGET_PAGE_SIZE > (bytes_xfer_now - rs->bytes_xfer_prev) / 2).
>
> 3) bytes_xfer counters should be updated alongside dirty_pages counters (for
> the same period).
> - This fixes the issue you described, as bytes_xfer_* will correspond to the
> period.
>
> I'll send the series shortly. Thoughts in the meantime?
I'll reply to that patchset then. Thanks.
--
Peter Xu