qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2 incrementa


From: Pankaj Gupta
Subject: Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2 incremental snapshot
Date: Fri, 8 Jun 2018 03:59:24 -0400 (EDT)

Hi Junyan,

AFAICU you are trying to utilize qcow2 capabilities to do incremental
snapshot. As I understand NVDIMM device (being it real or emulated), its
contents are always be backed up in backing device.  

Now, the question comes to take a snapshot at some point in time. You are 
trying to achieve this with qcow2 format (not checked code yet), I have below 
queries:

- Are you implementing this feature for both actual DAX device pass-through 
  as well as emulated DAX?
- Are you using additional qcow2 disk for storing/taking snapshots? How we are 
  planning to use this feature?

Reason I asked this question is if we concentrate on integrating qcow2
with DAX, we will have a full fledged solution for most of the use-cases. 

Thanks,
Pankaj 

> 
> Dear all:
> 
> I just switched from graphic/media field to virtualization at the end of the
> last year,
> so I am sorry that though I have already try my best but I still feel a
> little dizzy
> about your previous discussion about NVDimm via block layer:)
> In today's qemu, we use the SaveVMHandlers functions to handle both snapshot
> and migration.
> So for nvdimm kind memory, its migration and snapshot use the same way as the
> ram(savevm_ram_handlers). But the difference is the size of nvdimm may be
> huge, and the load
> and store speed is slower. According to my usage, when I use 256G nvdimm as
> memory backend,
> it may take more than 5 minutes to complete one snapshot saving, and after
> saving the qcow2
> image is bigger than 50G. For migration, this may not be a problem because we
> do not need
> extra disk space and the guest is not paused when in migration process. But
> for snapshot,
> we need to pause the VM and the user experience is bad, and we got concerns
> about that.
> I posted this question in Jan this year but failed to get enough reply. Then
> I sent a RFC patch
> set in Mar, basic idea is using the dependency snapshot and dirty log trace
> in kernel to
> optimize this.
> 
> https://lists.gnu.org/archive/html/qemu-devel/2018-03/msg04530.html
> 
> I use the simple way to handle this,
> 1. Separate the nvdimm region from ram when do snapshot.
> 2. If the first time, we dump all the nvdimm data the same as ram, and enable
> dirty log trace
> for nvdimm kind region.
> 3. If not the first time, we find the previous snapshot point and add
> reference to its clusters
> which is used to store nvdimm data. And this time, we just save dirty page
> bitmap and dirty pages.
> Because the previous nvdimm data clusters is ref added, we do not need to
> worry about its deleting.
> 
> I encounter a lot of problems:
> 1. Migration and snapshot logic is mixed and need to separate them for
> nvdimm.
> 2. Cluster has its alignment. When do snapshot, we just save data to disk
> continuous. Because we
> need to add ref to cluster, we really need to consider the alignment. I just
> use a little trick way
> to padding some data to alignment now, and I think it is not a good way.
> 3. Dirty log trace may have some performance problem.
> 
> In theory, this manner can be used to handle all kind of huge memory
> snapshot, we need to find the
> balance between guest performance(Because of dirty log trace) and snapshot
> saving time.
> 
> Thanks
> Junyan
> 
> 
> -----Original Message-----
> From: Stefan Hajnoczi [mailto:address@hidden
> Sent: Thursday, May 31, 2018 6:49 PM
> To: Kevin Wolf <address@hidden>
> Cc: Max Reitz <address@hidden>; He, Junyan <address@hidden>; Pankaj
> Gupta <address@hidden>; address@hidden; qemu block
> <address@hidden>
> Subject: Re: [Qemu-block] [Qemu-devel] Some question about savem/qcow2
> incremental snapshot
> 
> On Wed, May 30, 2018 at 06:07:19PM +0200, Kevin Wolf wrote:
> > Am 30.05.2018 um 16:44 hat Stefan Hajnoczi geschrieben:
> > > On Mon, May 14, 2018 at 02:48:47PM +0100, Stefan Hajnoczi wrote:
> > > > On Fri, May 11, 2018 at 07:25:31PM +0200, Kevin Wolf wrote:
> > > > > Am 10.05.2018 um 10:26 hat Stefan Hajnoczi geschrieben:
> > > > > > On Wed, May 09, 2018 at 07:54:31PM +0200, Max Reitz wrote:
> > > > > > > On 2018-05-09 12:16, Stefan Hajnoczi wrote:
> > > > > > > > On Tue, May 08, 2018 at 05:03:09PM +0200, Kevin Wolf wrote:
> > > > > > > >> Am 08.05.2018 um 16:41 hat Eric Blake geschrieben:
> > > > > > > >>> On 12/25/2017 01:33 AM, He Junyan wrote:
> > > > > > > >> I think it makes sense to invest some effort into such
> > > > > > > >> interfaces, but be prepared for a long journey.
> > > > > > > > 
> > > > > > > > I like the suggestion but it needs to be followed up with
> > > > > > > > a concrete design that is feasible and fair for Junyan and
> > > > > > > > others to implement.
> > > > > > > > Otherwise the "long journey" is really just a way of
> > > > > > > > rejecting this feature.
> > > 
> > > The discussion on NVDIMM via the block layer has runs its course.
> > > It would be a big project and I don't think it's fair to ask Junyan
> > > to implement it.
> > > 
> > > My understanding is this patch series doesn't modify the qcow2
> > > on-disk file format.  Rather, it just uses existing qcow2 mechanisms
> > > and extends live migration to identify the NVDIMM state state region
> > > to share the clusters.
> > > 
> > > Since this feature does not involve qcow2 format changes and is just
> > > an optimization (dirty blocks still need to be allocated), it can be
> > > removed from QEMU in the future if a better alternative becomes
> > > available.
> > > 
> > > Junyan: Can you rebase the series and send a new revision?
> > > 
> > > Kevin and Max: Does this sound alright?
> > 
> > Do patches exist? I've never seen any, so I thought this was just the
> > early design stage.
> 
> Sorry for the confusion, the earlier patch series was here:
> 
>   https://lists.nongnu.org/archive/html/qemu-devel/2018-03/msg04530.html
> 
> > I suspect that while it wouldn't change the qcow2 on-disk format in a
> > way that the qcow2 spec would have to be change, it does need to
> > change the VMState format that is stored as a blob within the qcow2 file.
> > At least, you need to store which other snapshot it is based upon so
> > that you can actually resume a VM from the incremental state.
> > 
> > Once you modify the VMState format/the migration stream, removing it
> > from QEMU again later means that you can't load your old snapshots any
> > more. Doing that, even with the two-release deprecation period, would
> > be quite nasty.
> > 
> > But you're right, depending on how the feature is implemented, it
> > might not be a thing that affects qcow2 much, but one that the
> > migration maintainers need to have a look at. I kind of suspect that
> > it would actually touch both parts to a degree that it would need
> > approval from both sides.
> 
> VMState wire format changes are minimal.  The only issue is that the previous
> snapshot's nvdimm vmstate can start at an arbitrary offset in the qcow2
> cluster.  We can find a solution to the misalignment problem (I think
> Junyan's patch series adds padding).
> 
> The approach references existing clusters in the previous snapshot's vmstate
> area and only allocates new clusters for dirty NVDIMM regions.
> In the non-qcow2 case we fall back to writing the entire NVDIMM contents.
> 
> So instead of:
> 
>   write(qcow2_bs, all_vmstate_data); /* duplicates nvdimm contents :( */
> 
> do:
> 
>   write(bs, vmstate_data_upto_nvdimm);
>   if (is_qcow2(bs)) {
>       snapshot_clone_vmstate_range(bs, previous_snapshot,
>                                    offset_to_nvdimm_vmstate);
>       overwrite_nvdimm_dirty_blocks(bs, nvdimm);
>   } else {
>       write(bs, nvdimm_vmstate_data);
>   }
>   write(bs, vmstate_data_after_nvdimm);
> 
> Stefan
> 
> 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]