qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: Strategic decision: COW format


From: Anthony Liguori
Subject: Re: [Qemu-devel] Re: Strategic decision: COW format
Date: Wed, 23 Feb 2011 08:21:24 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101027 Lightning/1.0b1 Thunderbird/3.0.10

On 02/23/2011 03:13 AM, Kevin Wolf wrote:
Am 22.02.2011 19:18, schrieb Anthony Liguori:
On 02/22/2011 10:15 AM, Kevin Wolf wrote:
Am 22.02.2011 16:57, schrieb Anthony Liguori:

On 02/22/2011 02:56 AM, Kevin Wolf wrote:

*sigh*

It starts to get annoying, but if you really insist, I can repeat it
once more: These features that you don't need (this is the correct
description for what you call "misfeatures") _are_ implemented in a way
that they don't impact the "normal" case.

Except that they require a refcount table that adds additional metadata
that needs to be updated in the fast path.  I consider that impacting
the normal case.

Like it or not, this requirement exists anyway, without any of your
"misfeatures".

You chose to use the dirty flag in QED in order to avoid having to flush
metadata too often, which is an approach that any other format, even one
using refcounts, can take as well.

It's a minor detail, but flushing and the amount of metadata are
separate points.
I agree that they are separate...

The dirty flag prevents metadata from being flushed to disk very often
but the use of a refcount table adds additional metadata.

A refcount table is definitely not required even if you claim the
requirement exists for other features.  I assume you mean to implement
trim/discard support but instead of a refcount table, a free list would
work just as well and would leave the metadata update out of the fast
path (allocating writes) and instead only be in the slow path
(trim/discard).
...but here you're arguing about writing metadata out in the fast path,
so you're actually not interested in the amount of metadata but in the
overhead of flushing it. Which is a problem that's solved.
I'm interested in both.  An extra write is always going to be an extra 
write.  The flush just makes it very painful.
A refcount table is essential for internal snapshots and compression,
it's useful for discard and for running on block devices, it's necessary
for avoiding the dirty flag and fsck on startup.
No, as designed today, qcow2 still needs a dirty flag to avoid leaking 
blocks.
These are five use cases that I can enumerate without thinking a lot
about it, there might be more. You propose using three different
mechanisms for allowing normal allocations (use the file size), block
devices (add a size field into the header) and discard (free list), and
the other three features, for which you can't think of a hack, you
declare "misfeatures".
No, I only label compression and internal snapshots as misfeatures.  
Encryption is a completely reasonable feature.
So even with qcow3, what's the expectation of snapshots?  Are we going 
to scale to images with over 1000 snapshots?  I believe snapshot support 
in qcow2 is not a feature that has been designed with any serious 
thought.  If we truly want to support internal snapshots, let's design 
it correctly.
As a format feature, a refcount table really only makes sense if the
refcount is required to be greater than a single bit.  There are more
optimal data structures that can be used if the refcount of a block is
fixed to 1-bit (like a free list) which is what the fundamental design
difference between qcow2 and qed is.
Okay, so even assuming that there's something like misfeatures that we
can kick out (with which I strongly disagree), what's the crucial
advantage of free lists that would make you switch the image format?
Performance.  One thing we haven't tested with qcow2 is O_SYNC 
performance in the guest but my suspicion is that an O_SYNC workload is 
going to perform poorly even with cache=none.
Starting with a simple format that we don't have to jump through 
tremendous hoops to get reasonable performance out of has a lot of virtues.
Regards,

Anthony Liguori



reply via email to

[Prev in Thread] Current Thread [Next in Thread]