qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] nbd: Add new qemu:joint-allocation metadata context


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH 2/2] nbd: Add new qemu:joint-allocation metadata context
Date: Thu, 10 Jun 2021 15:30:17 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0

10.06.2021 02:52, Nir Soffer wrote:
On Wed, Jun 9, 2021 at 9:01 PM Eric Blake<eblake@redhat.com>  wrote:
When trying to reconstruct a qcow2 chain using information provided
over NBD, ovirt had been relying on an unsafe assumption that any
portion of the qcow2 file advertised as sparse would defer to the
backing image; this worked with what qemu 5.2 reports for a qcow2 BSD
loaded with "backing":null.  However, in 6.0, commit 0da9856851 (nbd:
server: Report holes for raw images) also had a side-effect of
reporting unallocated zero clusters in qcow2 files as sparse.  This
change is correct from the NBD spec perspective (advertising bits has
always been optional based on how much information the server has
available, and should only be used to optimize behavior when a bit is
set, while not assuming semantics merely because a bit is clear), but
means that a qcow2 file that uses an unallocated zero cluster to
override a backing file now shows up as sparse over NBD, and causes
ovirt to fail to reproduce that cluster (ie. ovirt was assuming it
only had to write clusters where the bit was clear, and the 6.0
behavior change shows the flaw in that assumption).

The correct fix is for ovirt to additionally use the
qemu:allocation-depth metadata context added in 5.2: after all, the
actual determination for what is needed to recreate a qcow2 file is
not whether a cluster is sparse, but whether the allocation-depth
shows the cluster to be local.  But reproducing an image is more
efficient when handling known-zero clusters, which means that ovirt
has to track both base:allocation and qemu:allocation-depth metadata
contexts simultaneously.  While NBD_CMD_BLOCK_STATUS is just fine
sending back information for two contexts in parallel, it comes with
some bookkeeping overhead at the client side: the two contexts need
not report the same length of replies, and it involves more network
traffic.

Aren't both context described in one reply? Or what do you mean by not the same 
length?


So, as a convenience, we can provide yet another metadata context,
"qemu:joint-allocation", which provides the bulk of the same
information already available from using "base:allocation" and
"qemu:allocation-depth" in parallel; the only difference is that an
allocation depth larger than one is collapsed to a single bit, rather
than remaining an integer representing actual depth.  By connecting to
just this context, a client has less work to perform while still
getting at all pieces of information needed to recreate a qcow2
backing chain.
Providing extended allocation is awsome, and makes client life much
easier. But I'm not sure about the name, that comes from "joining"
"base:allocation" and "qemu:allocation-depth". This is correct when
thinking about qemu internals, but this is not really getting both, since
"qemu:allocation-depth" is reduced to local and backing.

 From a client point of view, I think this is best described as 
"qemu:allocation"
which is an extension to NBD protocol, providing the same HOLE and ZERO
bits, and qemu specific info LOCAL, BACKING. Using different "namespace"
("qemu" vs "base") makes it clear that this is not the same.

We discussed in the past the option to expose also the dirty status of every
block in the response. Again this info is available using
"qemu:dirty-bitmap:xxx"
but just like allocation depth and base allocation, merging the results is hard
and if we could expose also the dirty bit, this can make clients life
even better.
In this case I'm not sure "qemu:allocation" is the best name, maybe something
more generic like "qemu:extents" or "qemu:block-status" is even better.


Oops. Could you please describe, what's the problem with parsing several 
context simultaneously?

This all sound to me as we are going to implement "joint" combined conexts for 
every useful combination of existing contexts that user wants. So, it's a kind of 
workaround of inconvenient protocol we have invented in the past.

Doesn't it mean that we instead should rework, how we export several contexts? 
Maybe we can improve generic export of several contexts simultaneously, so that 
it will be convenient for the client? Than we don't need any additional 
combined contexts.

--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]