[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] NBD structured reads vs. block size

From: Wouter Verhelst
Subject: Re: [Qemu-block] NBD structured reads vs. block size
Date: Wed, 29 Aug 2018 11:28:05 +0200
User-agent: Mutt/1.10.1 (2018-07-13)

Hi Eric,

On Tue, Aug 28, 2018 at 03:41:24PM -0500, Eric Blake wrote:
> Revisiting this:
> On 08/01/2018 09:41 AM, Eric Blake wrote:
> > Rich Jones pointed me to questionable behavior in qemu's NBD server
> > implementation today: qemu advertises a minimum block size of 512 to any
> > client that promises to honor block sizes, but when serving up a raw
> > file that is not aligned to a sector boundary, attempting to read that
> > final portion of the file results in a structured read with two chunks,
> > the first for the data up to the end of the actual file, and the second
> > reporting a hole for the rest of the sector. If a client is promising to
> > obey block sizes on its requests, it seems odd that the server is
> > allowed to send a result that is not also aligned to block sizes.
> > 
> > Right now, the NBD spec says that when structured replies are in use,
> > then for a structured read:
> > 
> >      The server MAY split the reply into any number of content chunks;
> >      each chunk MUST describe at least one byte, although to minimize
> >      overhead, the server SHOULD use chunks with lengths and offsets as
> >      an integer multiple of 512 bytes, where possible (the first and
> >      last chunk of an unaligned read being the most obvious places for
> >      an exception).
> > 
> > I'm wondering if we should tighten that to require that the server
> > partition the reply chunks to be aligned to the advertised minimum block
> > size (at which point, qemu should either advertise 1 instead of 512 as
> > the minimum size when serving up an unaligned file, or else qemu should
> > just send the final partial sector as a single data chunk rather than
> > trying to report the last few bytes as a hole).

Right, if you have a file that is not a multiple of your minimum block
size, then you'll have a partial sector at the end. I think the proper
response to that is "don't do that, then"; that is, that you shouldn't
have a file which is not a multiple of your minimum block size. If it
is, you should just drop the partial sector at the end, IMO. Even if you
don't do that, it does definitely feel wrong to report a hole for a
partial sector if you say you won't accept requests for partial sectors.

I don't think we should make that a requirement, but it does feel like
the proper thing to do.

> > For comparison, on block status, we require:
> > 
> >     The server SHOULD use descriptor
> >      lengths that are an integer multiple of 512 bytes where possible
> >      (the first and last descriptor of an unaligned query being the
> >      most obvious places for an exception), and MUST use descriptor
> >      lengths that are an integer multiple of any advertised minimum
> >      block size.
> > 
> > And qemu as a client currently hangs up on any server that violates that
> > requirement on block status (that is, when qemu as the server tries to
> > send a block status that was not aligned to the advertised block size,
> > qemu as the client flags it as an invalid server - which means qemu as
> > server is currently broken).  So I'm thinking we should copy that
> > requirement onto servers for reads as well.
> Vladimir pointed out that the problem is not necessarily just limited to the
> implicit hole at the end of a file that was rounded up to sector size.
> Another case where sub-region changes occur in qemu is where you have a
> backing file with 512-byte hole granularity (qemu-img create -f qcow2 -o
> cluster_size=512 backing.qcow2 100M) and an overlay with larger granularity
> (qemu-img create -f qcow2 -b backing.qcow2 -F qcow2 -o cluster_size=4k
> active.qcow2). On a cluster where the top layer defers to the underlying
> layer, it is possible to alternate between holes and data at sector
> boundaries but at subsets of the cluster boundary of the top layer.  As long
> as qemu advertises a minimum block size of 512 rather than the cluster size,
> then this isn't a problem, but if qemu were to change to report the qcow2
> cluster size as its minimum I/O (rather than merely its preferred I/O,
> because it can do read-modify-write on data smaller than a cluster), this
> would be another case where unaligned replies might confuse a client.

Yes. In that case, I think the minimum block size should indeed be 512,
and should not at all be announced as 4k; that's what preferred I/O size
is for.

Could you people please use IRC like normal people?!?

  -- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008

reply via email to

[Prev in Thread] Current Thread [Next in Thread]