qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question on dirty sync before kvm memslot removal


From: Peter Xu
Subject: Re: Question on dirty sync before kvm memslot removal
Date: Tue, 31 Mar 2020 11:23:14 -0400

On Mon, Mar 30, 2020 at 03:11:53PM +0200, Paolo Bonzini wrote:
> On 27/03/20 16:04, Peter Xu wrote:
> > That makes perfect sense to me, however... What if the vcpu generates
> > dirty bits _after_ we do KVM_GET_DIRTY_LOG but _before_ we send
> > KVM_SET_USER_MEMORY_REGION to remove the memslot?  If the vcpu is in
> > the userspace I think it's fine because BQL is needed so it won't be
> > able to, however the vcpus running inside KVM should not be restricted
> > to that.  I think those dirty bits will still get lost, but it should
> > be extremely hard to trigger.
> 
> Yes, you've found a bug.
> 
> > I'm not sure whether I missed something above, but if I'm correct, I
> > think the solution should be a flag for KVM_SET_USER_MEMORY_REGION to
> > set the memslot as invalid (KVM_MEM_INVALID), then when removing the
> > memslot which has KVM_MEM_LOG_DIRTY_PAGES enabled, we should:
> > 
> >   - send KVM_SET_USER_MEMORY_REGION with KVM_MEM_INVALID to invalidate
> >     the memslot, but keep the slot and bitmap in KVM
> > 
> >   - send KVM_GET_DIRTY_LOG to fetch the bitmap for the slot
> > 
> >   - send KVM_SET_USER_MEMORY_REGION with size==0 to remove the slot
> 
> Or KVM_MEM_READONLY.

Yeah, I used a new flag because I thought READONLY was a bit tricky to
be used directly here.  The thing is IIUC if guest writes to a
READONLY slot then KVM should either ignore the write or trigger an
error which I didn't check, however here what we want to do is to let
the write to fallback to the userspace so it's neither dropped (we
still want the written data to land gracefully on RAM), nor triggering
an error (because the slot is actually writable).

> 
> > However I don't know whether that'll worth it.
> 
> Yes, especially in the light of the dirty ring issue below.
> 
> > (Side question which is irrelevant to this: for kvm dirty ring we now
> >  need to do similar thing to flush dirty bits before removing a
> >  memslot, however that's even trickier because flushing dirty ring
> >  needs to kick all vcpu out, currently the RFC series is using
> >  run_on_cpu() which will release the BQL and wait for all vcpus to
> >  quit into userspace, however that cannot be done inside
> >  kvm_set_phys_mem because it needs the BQL.  I'm still thinking about
> >  a good way to fix this, but any idea is greatly welcomed :)
> 
> The problem here is also that the GFN is not an unique identifier of the
> QEMU ram_addr_t.  However you don't really need to kick all vCPUs out,
> do you?  You can protect the dirty ring with its own per-vCPU mutex and
> harvest the pages from the main thread.

I'm not sure I get the point, but just to mention that currently the
dirty GFNs are collected in a standalone thread (in the QEMU series
it's called the reaper thread) rather than in the per vcpu thread
because the KVM_RESET_DIRTY_RINGS is per-vm after all.  One major
reason to kick the vcpus is to make sure the hardware cached dirty
GFNs (i.e. PML) are flushed synchronously.

I think the whole kick operation is indeed too heavy for this when
with the run_on_cpu() trick, because the thing we want to know (pml
flushing) is actually per-vcpu and no BQL interaction. Do we have/need
a lightweight way to kick one vcpu in synchronous way?  I was
wondering maybe something like responding a "sync kick" request in the
vcpu thread right after KVM_RUN ends (when we don't have BQL yet).
Would that make sense?

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]