qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] qcow2 journalling draft


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [RFC] qcow2 journalling draft
Date: Thu, 5 Sep 2013 11:35:43 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue, Sep 03, 2013 at 03:45:52PM +0200, Kevin Wolf wrote:
> This contains an extension of the qcow2 spec that introduces journalling
> to the image format, plus some preliminary type definitions and
> function prototypes in the qcow2 code.
> 
> Journalling functionality is a crucial feature for the design of data
> deduplication, and it will improve the core part of qcow2 by avoiding
> cluster leaks on crashes as well as provide an easier way to get a
> reliable implementation of performance features like Delayed COW.
> 
> At this point of the RFC, it would be most important to review the
> on-disk structure. Once we're confident that it can do everything we
> want, we can start going into more detail on the qemu side of things.
> 
> Signed-off-by: Kevin Wolf <address@hidden>
> ---
>  block/Makefile.objs   |   2 +-
>  block/qcow2-journal.c |  55 ++++++++++++++
>  block/qcow2.h         |  78 +++++++++++++++++++
>  docs/specs/qcow2.txt  | 204 
> +++++++++++++++++++++++++++++++++++++++++++++++++-
>  4 files changed, 337 insertions(+), 2 deletions(-)
>  create mode 100644 block/qcow2-journal.c

Although we are still discussing details of the on-disk layout, the
general design is clear enough to discuss how the journal will be used.

Today qcow2 uses Qcow2Cache to do lazy, ordered metadata updates.  The
performance is pretty good with two exceptions that I can think of:

1. The delayed CoW problem that Kevin has been working on.  Guests
   perform sequential writes that are smaller than a qcow2 cluster.  The
   first write triggers a copy-on-write of the full cluster.  Later
   writes then overwrite the copied data.  It would be more efficient to
   anticipate sequential writes and hold off on CoW where possible.

2. Lazy metadata updates lead to bursty behavior and expensive flushes.
   We do not take advantage of disk bandwidth since metadata updates
   stay in the Qcow2Cache until the last possible second.  When the
   guest issues a flush we must write out dirty Qcow2Cache entries and
   possibly fsync between them if dependencies have been set (e.g.
   refcount before L2).

How will the journal change this situation?  Writes that go through the
journal are doubled - they must first be journalled, fsync, and then
they can be applied to the actual image.

How do we benefit by using the journal?

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]