[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Loading snapshot with readonly qcow2 image

From: Eric Blake
Subject: Re: [Qemu-devel] Loading snapshot with readonly qcow2 image
Date: Fri, 14 Dec 2018 14:28:37 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1

On 12/14/18 10:03 AM, Michael Spradling wrote:

Can you combine -s (create a writable temp file) with -l to get what you

/me tries:

I can confirm that 'qemu-nbd -s a' lets me write data that is discarded on
disconnect (lsof says a temp file in /var/tmp/vl.XXXXXX was created); and
that 'qemu-nbd -l snap a' lets me read the snapshot data. But mixing the two
fails, and it would be a nice bug to fix.

I briefly looked at the code and is seams to be using the same base
functions as qemu does.  So, if I get this working for the model it
might also start working for qemu-nbd.

Ideally, I want to not modify old images or create new images with
qemu-img, so I have been not modifing qemu-img, but qemu directly
itself.  My use case will have several snapshots in an image.(say
100).  I will then later resume each of these snapshots in a qemu
session in parallel.  This is why I have gone done the route of modifying
the temp snapshots file /var/tmp/vl.XXXXX L1 and l2 tables.  My
understanding is if these are updated and the cluster doesn't exists in
the temp file the code will then look for it in the backing file.  Still
researching this area.

Right now, the only thing that qemu reads from a backing file is a guest cluster. L1/L2 clusters have to be local to the file that they are describing (there is no way to make an L2 table fall back to the contents of a different cluster in the backing file). It boils down to:

Does the active layer have an L2 mapping for the current cluster being read? Yes - read that cluster. No - ask the backing layer to provide the contents of that cluster (and if copy-on-read is enabled, also write those contents in a fresh allocation so that the current layer no longer has to defer to the backing).

Does the active layer have an L2 mapping for the current cluster containing the data being written? Yes - modify that cluster in place. No - allocate an new cluster, and if the write was for less than a full cluster, also ask the backing layer to provide the contents of the rest of the cluster for a copy-on-write action. After the write, the current layer no longer has to defer to the backing.

Creating an arbitrary qcow2 file on top of any arbitrary read-only backing layer (including 'qemu-nbd -l snap image) should be doable, even if verbose (since the "backing file" of a qcow2 BDS node can be any other BDS). Providing some shorter command lines, like making 'qemu-nbd -s -l snap image' work so that you don't have to provide your own manual overlay, is thus not a high priority.

I still don't have this working yet and I believe my area of problems is
qcow2_update_snapshot_refcount.  Can anyone explain what this does
exactly.  It seems the function does three different things based on the
value of addend, either -1, 0, 1, but its somewhat unclear.

Every cluster of qcow2 is reference-counted, to track which portions of the
file are (supposed to be) in use according to following the metadata trails.
When internal snapshots are used, this is implemented by incrementing the
refcount for each cluster that is reachable both from the snapshot and from
the current L1 table (update_snapshot_refcount +1), then when writing to the
cluster we break the reference count by writing the new data to a new
allocation and decrementing the reference count of the old cluster. When
trimming clusters, we decrement the refcount, and if it goes to 0 the
cluster can be reused for something else.

I think I understand this.  That would satifys addend being a -1 or 1.
I am still unclear why you would call the fuction with addend being 0.

An addend of 0 allows a couple of callers to temporarily have an inconsistent image for the sake of optimizing a bulk allocation/freeing, followed by informing the refcount table to match, with fewer changes to the cluster containing the refcounts than if the algorithm had to accurately use -1/+1 on a per-cluster basis.

Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

reply via email to

[Prev in Thread] Current Thread [Next in Thread]