[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v5 00/10] KVM: Dirty ring support (QEMU part)
From: |
Keqian Zhu |
Subject: |
Re: [PATCH v5 00/10] KVM: Dirty ring support (QEMU part) |
Date: |
Thu, 25 Mar 2021 09:21:53 +0800 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 |
Peter,
On 2021/3/24 23:09, Peter Xu wrote:
> On Wed, Mar 24, 2021 at 10:56:22AM +0800, Keqian Zhu wrote:
>> Hi Peter,
>>
>> On 2021/3/23 22:34, Peter Xu wrote:
>>> Keqian,
>>>
>>> On Tue, Mar 23, 2021 at 02:40:43PM +0800, Keqian Zhu wrote:
>>>>>> The second question is that you observed longer migration time
>>>>>> (55s->73s) when guest
>>>>>> has 24G ram and dirty rate is 800M/s. I am not clear about the reason.
>>>>>> As with dirty
>>>>>> ring enabled, Qemu can get dirty info faster which means it handles
>>>>>> dirty page more
>>>>>> quick, and guest can be throttled which means dirty page is generated
>>>>>> slower. What's
>>>>>> the rationale for the longer migration time?
>>>>>
>>>>> Because dirty ring is more sensitive to dirty rate, while dirty bitmap is
>>>>> more
>>>> Emm... Sorry that I'm very clear about this... I think that higher dirty
>>>> rate doesn't cause
>>>> slower dirty_log_sync compared to that of legacy bitmap mode. Besides,
>>>> higher dirty rate
>>>> means we may have more full-exit, which can properly limit the dirty rate.
>>>> So it seems that
>>>> dirty ring "prefers" higher dirty rate.
>>>
>>> When I measured the 800MB/s it's in the guest, after throttling.
>>>
>>> Imagine another example: a VM has 1G memory keep dirtying with 10GB/s.
>>> Dirty
>>> logging will need to collect even less for each iteration because memory
>>> size
>>> shrinked, collect even less frequent due to the high dirty rate, however
>>> dirty
>>> ring will use 100% cpu power to collect dirty pages because the ring keeps
>>> full.
>> Looks good.
>>
>> We have many places to collect dirty pages: the background reaper, vCPU exit
>> handler,
>> and the migration thread. I think migration time is closely related to the
>> migration thread.
>>
>> The migration thread calls kvm_dirty_ring_flush().
>> 1. kvm_cpu_synchronize_kick_all() will wait vcpu handles full-exit.
>> 2. kvm_dirty_ring_reap() collects and resets dirty pages.
>> The above two operation will spend more time with higher dirty rate.
>>
>> But I suddenly realize that the key problem maybe not at this. Though we
>> have separate
>> "reset" operation for dirty ring, actually it is performed right after we
>> collect dirty
>> ring to kvmslot. So in dirty ring mode, it likes legacy bitmap mode without
>> manual_dirty_clear.
>>
>> If we can "reset" dirty ring just before we really handle the dirty pages,
>> we can have
>> shorter migration time. But the design of dirty ring doesn't allow this,
>> because we must
>> perform reset to make free space...
>
> This is a very good point.
>
> Dirty ring should have been better in quite some ways already, but from that
> pov as you said it goes a bit backwards on reprotection of pages (not to
> mention currently we can't even reset the ring per-vcpu; that seems to be not
> fully matching the full locality that the rings have provided as well; but
> Paolo and I discussed with that issue, it's about TLB flush expensiveness, so
> we still need to think more of it..).
>
> Ideally the ring could have been both per-vcpu but also bi-directional (then
> we'll have 2*N rings, N=vcpu number), so as to split the state transition into
> "dirty ring" and "reprotect ring", then that reprotect ring will be the clear
> dirty log. That'll look more like virtio as used ring. However we'll still
> need to think about the TLB flush issue too as Paolo used to mention, as
> that'll exist too with any per-vcpu flush model (each reprotect of page will
> need a tlb flush of all vcpus).
>
> Or.. maybe we can make the flush ring a standalone one, so we need N dirty
> ring
> and one global flush ring.
Yep, have separate "reprotect" ring(s) is a good idea.
>
> Anyway.. Before that, I'd still think the next step should be how to integrate
> qemu to fully leverage current ring model, so as to be able to throttle in
> per-vcpu fashion.
>
> The major issue (IMHO) with huge VM migration is:
>
> 1. Convergence
> 2. Responsiveness
>
> Here we'll have a chance to solve (1) by highly throttle the working vcpu
> threads, meanwhile still keep (2) by not throttle user interactive threads.
> I'm not sure whether this will really work as expected, but just show what I'm
> thinking about it. These may not matter a lot yet with further improving ring
> reset mechanism, which definitely sounds even better, but seems orthogonal.
>
> That's also why I think we should still merge this series first as a fundation
> for the rest.
I see.
>
>>
>>>
>>>>
>>>>> sensitive to memory footprint. In above 24G mem + 800MB/s dirty rate
>>>>> condition, dirty bitmap seems to be more efficient, say, collecting dirty
>>>>> bitmap of 24G mem (24G/4K/8=0.75MB) for each migration cycle is fast
>>>>> enough.
>>>>>
>>>>> Not to mention that current implementation of dirty ring in QEMU is not
>>>>> complete - we still have two more layers of dirty bitmap, so it's
>>>>> actually a
>>>>> mixture of dirty bitmap and dirty ring. This series is more like a POC on
>>>>> dirty ring interface, so as to let QEMU be able to run on KVM dirty ring.
>>>>> E.g., we won't have hang issue when getting dirty pages since it's totally
>>>>> async, however we'll still have some legacy dirty bitmap issues e.g.
>>>>> memory
>>>>> consumption of userspace dirty bitmaps are still linear to memory
>>>>> footprint.
>>>> The plan looks good and coordinated, but I have a concern. Our dirty ring
>>>> actually depends
>>>> on the structure of hardware logging buffer (PML buffer). We can't say it
>>>> can be properly
>>>> adapted to all kinds of hardware design in the future.
>>>
>>> Sorry I don't get it - dirty ring can work with pure page wr-protect too?
>> Sure, it can. I just want to discuss many possible kinds of hardware logging
>> buffer.
>> However, I'd like to stop at this, at least dirty ring works well with PML.
>> :)
>
> I see your point. That'll be a good topic at least when we'd like to port
> dirty ring to other archs for sure. However as you see I hoped we can start
> to
> use dirty ring first, find issues, fix it, even redesign some of it, make it
> really beneficial at least on one arch, then it'll make more sense to port it,
> or attract people porting it. :)
>
> QEMU does not yet have a good solution for huge vm migration yet. Maybe dirty
> ring is a good start for it, maybe not (e.g., with uffd minor mode postcopy
> has
> the other chance). We'll see...
OK.
Thanks,
Keqian
- Re: [PATCH v5 10/10] KVM: Dirty ring support, (continued)
[PATCH v5 07/10] KVM: Cache kvm slot dirty bitmap size, Peter Xu, 2021/03/10
Re: [PATCH v5 00/10] KVM: Dirty ring support (QEMU part), Peter Xu, 2021/03/19
Re: [PATCH v5 00/10] KVM: Dirty ring support (QEMU part), Keqian Zhu, 2021/03/22