[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] [PATCH 4/6] dirty-bitmaps: clean-up bitmap

From: John Snow
Subject: Re: [Qemu-block] [Qemu-devel] [PATCH 4/6] dirty-bitmaps: clean-up bitmaps loading and migration logic
Date: Wed, 1 Aug 2018 18:28:02 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0

On 08/01/2018 04:47 PM, Denis V. Lunev wrote:
> On 08/01/2018 09:56 PM, John Snow wrote:
>> On 08/01/2018 02:42 PM, Denis V. Lunev wrote:
>>> On 08/01/2018 08:40 PM, Dr. David Alan Gilbert wrote:
>>>> * John Snow (address@hidden) wrote:
>>>>> On 08/01/2018 06:20 AM, Dr. David Alan Gilbert wrote:
>>>>>> * John Snow (address@hidden) wrote:
>>>>>> <snip>
>>>>>>> I'd rather do something like this:
>>>>>>> - Always flush bitmaps to disk on inactivate.
>>>>>> Does that increase the time taken by the inactivate measurably?
>>>>>> If it's small relative to everything else that's fine; it's just I
>>>>>> always worry a little since I think this happens after we've stopped the
>>>>>> CPU on the source, so is part of the 'downtime'.
>>>>>> Dave
>>>>>> --
>>>>>> Dr. David Alan Gilbert / address@hidden / Manchester, UK
>>>>> I'm worried that if we don't, we're leaving behind unusable, partially
>>>>> complete files behind us. That's a bad design and we shouldn't push for
>>>>> it just because it's theoretically faster.
>>>> Oh I don't care about theoretical speed; but if it's actually unusably
>>>> slow in practice then it needs fixing.
>>>> Dave
>>> This is not "theoretical" speed. This is real practical speed and
>>> instability.
>> It's theoretical until I see performance testing numbers; do you have
>> any? How much faster does the pivot happen by avoiding making the qcow2
>> consistent on close?
>> I don't argue that it's faster to just simply not write data, but what's
>> not obvious is how much time it actually saves in practice and if that's
>> worth doing unintuitive and undocumented things like "the source file
>> loses bitmaps after a migration because it was faster."
> Also, frankly speaking, I do not understand the goal of this purism.

The goal of my series originally was just to limit some corner cases. At
the time it was not evident that avoiding a flush was a *goal* of that
series rather than a *side-effect* or a means to an end (avoiding
migrating a bitmap over two different channels).

It was not immediately obvious to me that intentionally leaving behind
partially flushed qcow2 files was expected behavior. I still think it's
probably not the best behavior in general, but it's also not really
catastrophic either. If you had benchmarks it'd be useful to show an
obvious benefit to doing something unconventional.

In this case, I *do* consider not writing metadata back out to disk on
close something "unconventional."

Clearly my series missed missed an important case, so it can't be used
at all, and the status quo is also broken for several cases and also
cannot be used. With your performance concerns in mind, I'm looking at
Vladimir's series again. It might just require some more concise
comments explaining why you're taking the exact approach that you are.


> There 2 main cases - shared and non-shared storage. On shared
> storage:
> - normally migration is finished successfully. Source is shut down,
>   target is started. The data in the file on shared storage would be
>   __IMMEDIATELY__ marked as stale on target, i.e. you will save CBT
>  on source (with IO over networked fs), load CBT on target (with IO
>  over networked FS), mark CBT as stale (IO again). CBT data written
>  is marked as lost
> - failed migration. OK, we have CBT data written on source, CBT
>   data read on source, CBT data marked stale. Thus any CBT on
>   disk while VM is running is pure overhead.
> The same situation is when we use non-shared migration. In this
> case the situation is even worse. You save CBT and put it to trash
> upon migration complete.
> Please also note, that CBT saving almost does not protect us
> from powerlosses as the power should be lost at the very
> specific moment to allow data to survive and most likely
> we will have to drop CBT completely.
> Den

reply via email to

[Prev in Thread] Current Thread [Next in Thread]