qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-block] Some question about savem/qcow2 incrementa


From: Max Reitz
Subject: Re: [Qemu-devel] [Qemu-block] Some question about savem/qcow2 incremental snapshot
Date: Wed, 9 May 2018 19:54:31 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0

On 2018-05-09 12:16, Stefan Hajnoczi wrote:
> On Tue, May 08, 2018 at 05:03:09PM +0200, Kevin Wolf wrote:
>> Am 08.05.2018 um 16:41 hat Eric Blake geschrieben:
>>> On 12/25/2017 01:33 AM, He Junyan wrote:
>> 2. Make the nvdimm device use the QEMU block layer so that it is backed
>>    by a non-raw disk image (such as a qcow2 file representing the
>>    content of the nvdimm) that supports snapshots.
>>
>>    This part is hard because it requires some completely new
>>    infrastructure such as mapping clusters of the image file to guest
>>    pages, and doing cluster allocation (including the copy on write
>>    logic) by handling guest page faults.
>>
>> I think it makes sense to invest some effort into such interfaces, but
>> be prepared for a long journey.
> 
> I like the suggestion but it needs to be followed up with a concrete
> design that is feasible and fair for Junyan and others to implement.
> Otherwise the "long journey" is really just a way of rejecting this
> feature.
> 
> Let's discuss the details of using the block layer for NVDIMM and try to
> come up with a plan.
> 
> The biggest issue with using the block layer is that persistent memory
> applications use load/store instructions to directly access data.  This
> is fundamentally different from the block layer, which transfers blocks
> of data to and from the device.
> 
> Because of block DMA, QEMU is able to perform processing at each block
> driver graph node.  This doesn't exist for persistent memory because
> software does not trap I/O.  Therefore the concept of filter nodes
> doesn't make sense for persistent memory - we certainly do not want to
> trap every I/O because performance would be terrible.
> 
> Another difference is that persistent memory I/O is synchronous.
> Load/store instructions execute quickly.  Perhaps we could use KVM async
> page faults in cases where QEMU needs to perform processing, but again
> the performance would be bad.

Let me first say that I have no idea how the interface to NVDIMM looks.
I just assume it works pretty much like normal RAM (so the interface is
just that it’s a part of the physical address space).

Also, it sounds a bit like you are already discarding my idea, but here
goes anyway.

Would it be possible to introduce a buffering block driver that presents
the guest an area of RAM/NVDIMM through an NVDIMM interface (so I
suppose as part of the guest address space)?  For writing, we’d keep a
dirty bitmap on it, and then we’d asynchronously move the dirty areas
through the block layer, so basically like mirror.  On flushing, we’d
block until everything is clean.

For reading, we’d follow a COR/stream model, basically, where everything
is unpopulated in the beginning and everything is loaded through the
block layer both asynchronously all the time and on-demand whenever the
guest needs something that has not been loaded yet.

Now I notice that that looks pretty much like a backing file model where
we constantly run both a stream and a commit job at the same time.

The user could decide how much memory to use for the buffer, so it could
either hold everything or be partially unallocated.

You’d probably want to back the buffer by NVDIMM normally, so that
nothing is lost on crashes (though this would imply that for partial
allocation the buffering block driver would need to know the mapping
between the area in real NVDIMM and its virtual representation of it).

Just my two cents while scanning through qemu-block to find emails that
don’t actually concern me...

Max

> Most protocol drivers do not support direct memory access.  iscsi, curl,
> etc just don't fit the model.  One might be tempted to implement
> buffering but at that point it's better to just use block devices.
> 
> I have CCed Pankaj, who is working on the virtio-pmem device.  I need to
> be clear that emulated NVDIMM cannot be supported with the block layer
> since it lacks a guest flush mechanism.  There is no way for
> applications to let the hypervisor know the file needs to be fsynced.
> That's what virtio-pmem addresses.
> 
> Summary:
> A subset of the block layer could be used to back virtio-pmem.  This
> requires a new block driver API and the KVM async page fault mechanism
> for trapping and mapping pages.  Actual emulated NVDIMM devices cannot
> be supported unless the hardware specification is extended with a
> virtualization-friendly interface in the future.
> 
> Please let me know your thoughts.
> 
> Stefan
> 


Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]