Re: [Qemu-devel] [PATCH] Introduce cache images for the QCOW2 format

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] Introduce cache images for the QCOW2 format

From:	Stefan Hajnoczi
Subject:	Re: [Qemu-devel] [PATCH] Introduce cache images for the QCOW2 format
Date:	Thu, 15 Aug 2013 10:32:30 +0200
User-agent:	Mutt/1.5.21 (2010-09-15)

On Wed, Aug 14, 2013 at 04:20:27PM +0200, Kaveh Razavi wrote:
> Hi,
> 
> On 08/14/2013 11:29 AM, Stefan Hajnoczi wrote:
> > 100 MB is small enough for RAM.  Did you try enabling the host kernel
> > page cache for the backing file?  That way all guests running on this
> > host share a single RAM-cached version of the backing file.
> >
> 
> Yes, indeed. That is why we think it makes sense to store many of these
> cache images on memory, but at the storage node to avoid hot-spotting
> its disk(s). Relying on the page-cache at the storage node may not be
> enough, since there is no guarantee on what stays there.
> 
> The VM host page cache can be evicted at any time, requiring it to go to
> the network again to read from the backing file. Since these cache
> images are small, it is possible to store many of them at the hosts,
> instead of caching many complete backing images that are usually in GB
> order.

I don't buy the argument about the page cache being evicted at any time:

At the scale where caching is important, provisioning a measily 100 MB
of RAM per guest should not be a challenge.

cgroups can be used to isolate page cache between VMs if you want to
guaranteed caches.

But it could be more interesting not to isolate so that the page cache
acts host-wide to reduce the overall I/O instead of narrowly focussing
on caching 100 MB for a specific image even if it is rarely accessed.

The real downside I see is that the page cache is volatile, so you could
see heavy I/O if multiple hosts reboot at the same time.

> > The other existing solution is to use the image streaming feature, which
> > was designed to speed up deployment of image files over the network.  It
> > copies the contents of the image from a remote server onto the host
> > while allowing immediate random access from the guest.  This isn't a
> > cache, this is a full copy of the image.
> > 
> 
> Streaming the complete image may work well for some cases, but streaming
> at scale to many hosts at the same time can easily create a bottleneck
> at the network. In most scenarios, only a fraction of the backing file
> is needed during the lifetime of a VM.

Streaming offers a rate limiting parameter so you can tune it to the
network conditions.

Copying the full image doesn't just reduce load on the NFS server, it
also means guests can continue to run if the NFS server becomes
unreachable.  That's an important property for reliability.

> > I share an idea of how to turn this into a cache in a second, but first
> > how to deploy this safely.  Since multiple QEMU processes can share a
> > backing file and the cache must not suffer from corruptions due to
> > races, you can use one qemu-nbd per backing image.  The QEMU processes
> > connect to the local read-only qemu-nbd server.
> > 
> > If you want a cache you could enable copy-on-read without the image
> > streaming feature (block_stream command) and evict old data using
> > discard commands.  No qcow2 image format changes are necessary to do
> > this.
> 
> This is an interesting alternative. I may be wrong, but I think there
> are two limitations with this: 1) it is not persistent and 2) you can
> not enforce quota.
> 
> (1) is important if you would like to have a pool of these cache images
> that survives a reboot. (2) is important, if the caching medium is a
> scarce resource such as memory and also if you want to make sure that
> only important data blocks get cached (i.e. data blocks needed for booting).

1)
It is persistent.  The backing file chain looks like this:

  /nfs/template.qcow2 <- /local/cache.qcow2 <- /local/vm001.qcow2

The cache is a regular qcow2 image file that is persistent.  The discard
command is used to evict data from the file.  Copy-on-read accesses are
used to populate the cache when the guest submits a read request.

2)
You can set cache size or other parameters as a qemu-nbd option (this
doesn't exist but could be implemented):

  $ qemu-img create -f qcow2 -o backing_file=/nfs/template.qcow2 cache.qcow2
  $ qemu-nbd --options cache-size=100MB,evict=lru cache.qcow2

So it's the qemu-nbd process that performs the cache housekeeping work.
The cache.qcow2 file itself just persists data and isn't aware of cache
settings.

Stefan

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH] Introduce cache images for the QCOW2 format, (continued)
- Re: [Qemu-devel] [PATCH] Introduce cache images for the QCOW2 format, Stefan Hajnoczi, 2013/08/14
  - Re: [Qemu-devel] [PATCH] Introduce cache images for the QCOW2 format, Kaveh Razavi, 2013/08/14
    - Re: [Qemu-devel] [PATCH] Introduce cache images for the QCOW2 format, Stefan Hajnoczi <=
    - Re: [Qemu-devel] [PATCH] Introduce cache images for the QCOW2 format, Kaveh Razavi, 2013/08/15

Prev by Date: Re: [Qemu-devel] [PATCH 2/2] qemu-timer: make qemu_timer_mod_ns() and qemu_timer_del() thread-safe
Next by Date: Re: [Qemu-devel] [PATCH 2/2] qemu-timer: make qemu_timer_mod_ns() and qemu_timer_del() thread-safe
Previous by thread: Re: [Qemu-devel] [PATCH] Introduce cache images for the QCOW2 format
Next by thread: Re: [Qemu-devel] [PATCH] Introduce cache images for the QCOW2 format
Index(es):
- Date
- Thread