[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v0 0/7] Background snapshots
From: |
Peter Xu |
Subject: |
Re: [Qemu-devel] [PATCH v0 0/7] Background snapshots |
Date: |
Tue, 3 Jul 2018 13:54:47 +0800 |
User-agent: |
Mutt/1.10.0 (2018-05-17) |
On Mon, Jul 02, 2018 at 03:40:31PM +0300, Denis Plotnikov wrote:
>
>
> On 02.07.2018 14:23, Peter Xu wrote:
> > On Fri, Jun 29, 2018 at 11:03:13AM +0300, Denis Plotnikov wrote:
> > > The patch set adds the ability to make external snapshots while VM is
> > > running.
> >
> > Hi, Denis,
> >
> > This work is interesting, though I have a few questions to ask in
> > general below.
> >
> > >
> > > The workflow to make a snapshot is the following:
> > > 1. Pause the vm
> > > 2. Make a snapshot of block devices using the scheme of your choice
> >
> > Here you explicitly took the snapshot for the block device, then...
> >
> > > 3. Turn on background-snapshot migration capability
> > > 4. Start the migration using the destination (migration stream) of your
> > > choice.
> >
> > ... here you started the VM snapshot. How did you make sure that the
> > VM snapshot (e.g., the RAM data) and the block snapshot will be
> > aligned?
> As the VM has been paused before making an image(disk) snapshot, there
> should be no requests to the original image done ever since. All the later
> request's goes to the disk snapshot.
>
> At the point we have a disk image and its snapshot.
> In the image we have kind of checkpoint-ed state which won't (shouldn't) be
> changed because all the writing requests should go to the image snapshot.
>
> Then we start the background snapshot which marks all the memory as
> read-only and writing the part of VM state to the VM snapshot file.
> By making the memory read-only we kind of freeze the state of the RAM.
>
> At that point we have an original image and the VM memory content which
> corresponds to each other because the VM isn't running.
>
> Then, the background snapshot thread continues VM execution with the
> read-only-marked memory which is being written to the external VM snapshot
> file. All the write accesses to the memory are intercepted and the memory
> pages being accessed are written to the VM snapshot (VM state) file in
> priority. Having marked as read-write right after the writing, the memory
> pages aren't tracked for later accesses.
>
> This is how we guarantee that the VM snapshot (state) file has the memory
> content corresponding to the moment when the disk snapshot is created.
>
> When the writing ends, we have the VM snapshot (VM state) file which has the
> memory content saved by the moment of the image snapshot creating.
>
> So, to restore the VM from "the snapshot" we need to use the original image
> disk (not the disk snapshot) and the VM snapshot (VM state with saved
> memory) file.
My bad to have not noticed about the implication of vm_stop() as the
first step. Your explanation is clear. Thank you!
>
> >
> > For example, in current save_snapshot() we'll quiesce disk IOs before
> > migrating the last pieces of RAM data to make sure they are aligned.
> > I didn't figure out myself on how that's done in this work.
> >
> > > The migration will resume the vm execution by itself
> > > when it has the devices' states saved and is ready to start ram
> > > writing
> > > to the migration stream.
> > > 5. Listen to the migration finish event
> > >
> > > The feature relies on KVM unapplied ability to report the faulting
> > > address.
> > > Please find the KVM patch snippet to make the patchset work below:
> > >
> > > +++ b/arch/x86/kvm/vmx.c
> > > @@ -XXXX,X +XXXX,XX @@ static int handle_ept_violation(struct kvm_vcpu
> > > *vcpu)
> > > vcpu->arch.exit_qualification = exit_qualification;
> > > - return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
> > > + r = kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
> > > + if (r == -EFAULT) {
> > > + unsigned long hva = kvm_vcpu_gfn_to_hva(vcpu, gpa >>
> > > PAGE_SHIFT);
> > > +
> > > + vcpu->run->exit_reason = KVM_EXIT_FAIL_MEM_ACCESS;
> > > + vcpu->run->hw.hardware_exit_reason =
> > > EXIT_REASON_EPT_VIOLATION;
> > > + vcpu->run->fail_mem_access.hva = hva | (gpa &
> > > (PAGE_SIZE-1));
> > > + r = 0;
> > > +
> > > + }
> > > + return r;
> >
> > Just to make sure I fully understand here: so this is some extra KVM
> > work just to make sure the mprotect() trick will work even for KVM
> > vcpu threads, am I right?
>
> That's correct!
> >
> > Meanwhile, I see that you only modified EPT violation code, then how
> > about the legacy hardwares and softmmu case?
>
> Didn't check thoroughly but the scheme works in TCG mode.
Yeah I guess TCG will work since the SIGSEGV handler will work with
that. I meant the shadow MMU implementation in KVM when
kvm_intel.ept=0 is set on the host. But of course that's not a big
deal for now since that can be discussed in the kvm counterpart of the
work. Meanwhile, considering that this series seems to provide a
general framework for live snapshot, this work is meaningful no matter
what backend magic is used (either mprotect, or userfaultfd in the
future).
Thanks,
--
Peter Xu