[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] migration: keep bytes_xfer_prev init'd to zero
From: |
Felipe Franciosi |
Subject: |
Re: [Qemu-devel] [PATCH] migration: keep bytes_xfer_prev init'd to zero |
Date: |
Thu, 25 May 2017 10:50:46 +0000 |
> On 25 May 2017, at 01:36, Peter Xu <address@hidden> wrote:
>
> On Wed, May 24, 2017 at 01:02:25PM +0000, Felipe Franciosi wrote:
>>
>>> On 23 May 2017, at 05:27, Peter Xu <address@hidden> wrote:
>>>
>>> On Fri, May 19, 2017 at 10:59:02PM +0100, Felipe Franciosi wrote:
>>>> The first time migration_bitmap_sync() is called, bytes_xfer_prev is set
>>>> to ram_state.bytes_transferred which is, at this point, zero. The next
>>>> time migration_bitmap_sync() is called, an iteration has happened and
>>>> bytes_xfer_prev is set to 'x' bytes. Most likely, more than one second
>>>> has passed, so the auto converge logic will be triggered and
>>>> bytes_xfer_now will also be set to 'x' bytes.
>>>>
>>>> This condition is currently masked by dirty_rate_high_cnt, which will
>>>> wait for a few iterations before throttling. It would otherwise always
>>>> assume zero bytes have been copied and therefore throttle the guest
>>>> (possibly) prematurely.
>>>>
>>>> Given bytes_xfer_prev is only used by the auto convergence logic, it
>>>> makes sense to only set its value after a check has been made against
>>>> bytes_xfer_now.
>>>>
>>>> Signed-off-by: Felipe Franciosi <address@hidden>
>>>> ~
>>>> ---
>>>> migration/ram.c | 4 ----
>>>> 1 file changed, 4 deletions(-)
>>>>
>>>> diff --git a/migration/ram.c b/migration/ram.c
>>>> index f59fdd4..793af39 100644
>>>> --- a/migration/ram.c
>>>> +++ b/migration/ram.c
>>>> @@ -670,10 +670,6 @@ static void migration_bitmap_sync(RAMState *rs)
>>>>
>>>> rs->bitmap_sync_count++;
>>>>
>>>> - if (!rs->bytes_xfer_prev) {
>>>> - rs->bytes_xfer_prev = ram_bytes_transferred();
>>>> - }
>>>> -
>>>> if (!rs->time_last_bitmap_sync) {
>>>> rs->time_last_bitmap_sync = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>>>> }
>>>> --
>>>> 1.9.5
>>>>
>>>>
>>>
>>> I feel like this patch wants to correctly initialize bytes_xfer_prev,
>>> however I still see problem. E.g., when user specify auto-convergence
>>> during migration, and in the first iteration we'll always have a very
>>> small bytes_xfer_prev (with this patch, it'll be zero) with a very big
>>> bytes_xfer_now (which is the ram_bytes_transferred() value).
>>
>> Interesting point. Worth noting, that's no different than what happens today
>> anyway (bytes_xfer_prev would be initialised to the first non-zero
>> bytes_transferred). I therefore don't think it should stop or slow this
>> patch's acceptance.
>>
>>> If so, do
>>> you think squash below change together with current one would be more
>>> accurate?
>>
>> As a matter of fact I had a different idea (below) to fix what you are
>> describing. I was still experimenting with this code, so haven't sent a
>> patch yet. But I'm going to send a series soon for comments. Basically, I
>> think some other changes are required to make sure these numbers are correct:
>
> I agree on most points.
>
>>
>> 1) dirty_rate_high_cnt++ >= 2 should be ++dirty_rate_high_cnt >= 2
>> - The original commit msg from 070afca25 (Jason J. Herne) says that
>> convergence should be triggered after two passes. Current code does it after
>> three passes (and four passes the first time around; see number 2 below).
>> - I personally feel this counter should go away altogether. If the migration
>> is not converging and the VM is going to be throttled, there's no point
>> stressing the network any further; just start throttling straight away.
>>
>> 2) dirty_pages_rate should be updated before the autoconverge logic.
>> - Right now, we delay throttling by a further iteration, as dirty_pages_rate
>> is set after the first pass through the autoconverge logic (it is zero the
>> first time around).
>> - The "if (rs->dirty_pages_rate &&..." part of the conditional can then be
>> removed, as it won't ever be zero.
>
> For this one: why dirty_pages_rate cannot be zero?
Yeah good point. If _nothing_ got dirtied in the period (possible I suppose),
then the dirty rate would be zero.
> But I agree with
> you that it can be removed since even if it's zero, then the next
> check would fail as well (rs->num_dirty_pages_period *
> TARGET_PAGE_SIZE > (bytes_xfer_now - rs->bytes_xfer_prev) / 2).
Exactly. I suppose that check was introduced to consider the first iteration
where the counter hadn't been initialised. But it's not needed.
>
>>
>> 3) bytes_xfer counters should be updated alongside dirty_pages counters (for
>> the same period).
>> - This fixes the issue you described, as bytes_xfer_* will correspond to the
>> period.
>>
>> I'll send the series shortly. Thoughts in the meantime?
>
> I'll reply to that patchset then. Thanks.
Thanks for the quick feedback on this. I've included this change on the
patchset, so let's continue the conversation there.
Felipe
>
> --
> Peter Xu