[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Re: Strategic decision: COW format
From: |
Chunqiang Tang |
Subject: |
Re: [Qemu-devel] Re: Strategic decision: COW format |
Date: |
Sun, 13 Mar 2011 00:51:35 -0500 |
> It seems that there is great interest in QCOW2's
> internal snapshot feature. If we really want to do that, the right
solution is
> to follow VMDK's approach of storing each snapshot as a separate COW
file (see
> http://www.vmware.com/app/vmdk/?src=vmdk ), rather than using the
reference
> count table. VMDK’s approach can be easily implemented for any COW
format, or
> even as a function of the generic block layer, without complicating any
COW
> format or hurting its performance.
After the heated debate, I thought more about the right approach of
implementing snapshot, and it becomes clear to me that there are major
limitations with both VMDK's external snapshot approach (which stores each
snapshot as a separate CoW file) and QCOW2's internal snapshot approach
(which stores all snapshots in one file and uses a reference count table
to keep track of them). I just posted to the mailing list a patch that
implements internal snapshot in FVD but does it in a way without the
limitations of VMDK and QCOW2.
Let's first list the properties of an ideal virtual disk snapshot
solution, and then discuss how to achieve them.
G1: Do no harm (or avoid being a misfeature), i.e., the added snapshot
code should not slow down the runtime performance of an image that has no
snapshots. This implies that an image without snapshot should not cache
the reference count table in memory and should not update the on-disk
reference count table.
G2: Even better, an image with 1 snapshot runs as fast as an image without
snapshot.
G3: Even even better, an image with 1,000 snapshots runs as fast as an
image without snapshot. This basically means getting the snapshot feature
for free.
G4: An image with 1,000 snapshots consumes no more memory than an image
without snapshot. This again means getting the snapshot feature for free.
G5: Regardless of the number of existing snapshots, creating a new
snapshot is fast, e.g., taking no more than 1 second.
G6: Regardless of the number of existing snapshots, deleting a snapshot is
fast, e.g., taking no more than 1 second.
Now let's evaluate VMDK and QCOW2 against these ideal properties.
G1: VMDK good; QCOW2 poor
G2: VMDK ok; QCOW2 poor
G3: VMDK very poor; QCOW2 poor
G4: VMDK very poor; QCOW2 poor
G5: VMDK good; QCOW2 good
G6: VMDK poor; QCOW2 good
The evaluation above assumes a straightforward VMDK implementation that,
when handling a long chain of snapshots, s0<-s1<-s2<- … <-s1000, it uses a
chain of 1,000 VMDK driver instances to represent the chain of backing
files. This is slow and consumes a lot of memory, but it is the behavior
of QEMU's block device architecture today.
Even if the QEMU architecture can be revised and the VMDK implementation
is optimized to extreme, a fundamental limitation of VMDK (by design
instead of by implementation) is G6, i.e., deleting a snapshot X in the
middle of a snapshot chain is slow (this is also what I observed with the
VMware software). Because each snapshot is stored as a separate file, when
a snapshot X is deleted, part of X's data blocks that are still needed by
its children Y must be physically copied from file X to file Y, which is
slow and the VM is halted during the copy operation. QCOW2's internal
snapshot approach avoids this problem. Since all snapshots are stored in
one file, when a snapshot is deleted, QCOW2 only needs to update its
reference count table without physically moving data blocks.
On the other hand, QCOW'2 internal snapshot has two major limitations that
hurt runtime performance: caching the reference count table in memory and
updating the on-disk reference count table. If we can eliminate both, then
it is an ideal solution. This is exactly what FVD's internal snapshot
solution does. Below is the key observation on why FVD can do it so
efficiently.
In an internal snapshot implementation, the reference count table is used
to track used blocks and free blocks. It serves no other purposes. In FVD,
its "static" reference count table only tracks blocks used by (static)
snapshots, and it does not track blocks (dynamically) allocated (on a
write) or freed (on a trim) for the running VM. This is a simple but
fundamental difference w.r.t. to QCOW2, whose reference count table tracks
both the static content and the dynamic content. Because data blocks used
by snapshots are static and do not change unless a snapshot is created or
deleted, there is no need to update FVD's "static" reference count table
when a VM runs, and actually there is even no need to cache it in memory.
Data blocks that are dynamically allocated or freed for a running VM are
already tracked by FVD's one-level lookup table (which is similar to
QCOW2's two-level table, but in FVD it is much smaller and faster) even
before introducing the snapshot feature, and hence it comes for free.
Updating FVD's one-level lookup table is efficient because of FVD's
journal.
When the VM boots, FVD scans the reference count table once to build a
so-called free-block-bitmap in memory, which identifies blocks not used by
static snapshots. The reference count table is then thrown away and never
updated when the VM runs. For an image with 1TB snapshot data, the
free-block-bitmap is only 125KB, i.e., the memory overhead is negligible.
For an image with 1TB snapshot data, FVD's reference count table is 2MB,
and scanning it once at VM boot time takes no more than 20 milliseconds.
In short, FVD's internal snapshot achieves the ideal properties of G1-G6,
by 1) using the reference count table to only track "static" snapshots, 2)
not keeping the reference count table in memory, 3) not updating the
on-disk "static" reference count table when the VM runs, and 4)
efficiently tracking dynamically allocated blocks by piggybacking on FVD's
other features, i.e., its journal and small one-level lookup table.
Regards,
ChunQiang (CQ) Tang
Homepage: http://www.research.ibm.com/people/c/ctang
- Re: [Qemu-devel] Re: Strategic decision: COW format,
Chunqiang Tang <=
- Re: [Qemu-devel] Re: Strategic decision: COW format, Anthony Liguori, 2011/03/13
- Re: [Qemu-devel] Re: Strategic decision: COW format, Chunqiang Tang, 2011/03/13
- Re: [Qemu-devel] Re: Strategic decision: COW format, Anthony Liguori, 2011/03/14
- Re: [Qemu-devel] Re: Strategic decision: COW format, Chunqiang Tang, 2011/03/14
- Re: [Qemu-devel] Re: Strategic decision: COW format, Anthony Liguori, 2011/03/14
- Re: [Qemu-devel] Re: Strategic decision: COW format, Kevin Wolf, 2011/03/14
- Re: [Qemu-devel] Re: Strategic decision: COW format, Chunqiang Tang, 2011/03/14
- Re: [Qemu-devel] Re: Strategic decision: COW format, Anthony Liguori, 2011/03/14
- Re: [Qemu-devel] Re: Strategic decision: COW format, Stefan Hajnoczi, 2011/03/14
- Re: [Qemu-devel] Re: Strategic decision: COW format, Kevin Wolf, 2011/03/14