[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/5] block: Fix BDRV_BLOCK_RAW status to honor alignment

From: Eric Blake
Subject: Re: [PATCH 2/5] block: Fix BDRV_BLOCK_RAW status to honor alignment
Date: Thu, 25 Feb 2021 10:03:53 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.0

On 2/25/21 8:55 AM, Vladimir Sementsov-Ogievskiy wrote:
> 18.02.2021 23:15, Eric Blake wrote:
>> Previous patches mentioned how the blkdebug filter driver demonstrates
>> a bug present in our NBD server (for example, commit b0245d64); the
>> bug is also present with the raw format driver when probing
>> occurs. Basically, if we specify a particular alignment > 1, but defer
>> the actual block status to the underlying file, and the underlying
>> file has a smaller alignment, then the use of BDRV_BLOCK_RAW to defer
>> to the underlying file can end up with status split at an alignment
>> unacceptable to our layer.  Many callers don't care, but NBD does - it
>> is a violation of the NBD protocol to send status or read results
>> split on an unaligned boundary (in 737d3f5244, we taught our 4.0
>> client to be tolerant of such violations because the problem was even
>> more pronounced with qemu 3.1 as server; but we still need to fix our
>> server for the sake of other stricter clients).

>> + * - BDRV_BLOCK_ALLOCATED is set if the flag is set for at least one
>> subregion
> Hmm about this..
> We already have mess around ALLOCATED:
>  [1] for format drivers it means that "read is handled by this layer,
> not by backing", i.e. data (or zero) is placed exactly on that layer of
> backing-chain

If we're reading at a given granularity, then the 4k read is satisfied
at this layer even if portions of the read came from lower layers.  So
the logic works here.

>  [2] for protocol drivers it's up to the driver, which may always report
> ALLOCATED (being more compatible with format drivers) or it may
> sometimes return UNALLOCATED to show that data is not allocated in FS..

We've been moving away from this particular overload.  What's more, most
protocol drivers that set it at all set it for every single byte,
because protocol layers don't have a notion of a backing file; which
means that if it is set at all, it will be set for every byte anyway, so
the logic works here.

> And this way, bdrv_co_block_status_aligned() is compatible with protocol
> drivers but not with format drivers (as you can't combine
> "go-to-backing" information of several flags, as for some scenarios it's
> safer to consider the whole region ALLOCATED and for another it's safer
> to consider it UNALLOCATED.
> For example for stream target it's safer to consider target block
> UNALLOCATED and do extra copy-on-read operation. And for stream base
> it's safer to consider block ALLOCATED (and again do extra copying, not
> missing something significant).
> I think, to avoid increasing of the mess, we should first split [1] from
> [2] somehow..
> Assume we change it to BDRV_BLOCK_PROTOCOL_ALLOCATED and

Maybe it is indeed worth splitting out two different flags to fully
distinguish between the two overloaded meanings, but that seems like an
independent patch to this series.

> Then, for BDRV_BLOCK_PROTOCOL_ALLOCATED we probably can just report
> BDRV_BLOCK_PROTOCOL_ALLOCATED if at least one of extents reports
> BDRV_BLOCK_PROTOCOL_ALLOCATED. (probably we don't need
> BDRV_BLOCK_PROTOCOL_ALLOCATED at all and can drop this logic)
> But for BDRV_BLOCK_GO_TO_BACKING we'll have to also add
> extents report BDRV_BLOCK_GO_TO_BACKING
>  * BDRV_BLOCK_GO_TO_BACKING if all extents report no
>  * <nothing> if some extents report BDRV_BLOCK_GO_TO_BACKING but others
> not.
> Hmm.. And, I think that there is a problem is in NBD protocol. Actually,
> with allocation-depth context we started to report internal layers of
> backing chain. And if we have layers with different request-alignment
> it's not correct to report allocation-depth "aligned" to top image
> request-alignment.. So, for allocation-depth to work correctly we should
> extend NBD protocol to allow unaligned chunks in BLOCK_STATUS report.

The NBD protocol says that base:allocation must obey allocation rules.
If we want to declare that "because qemu:allocation-depth is an
extension, we choose to NOT obey allocation rules, and if your client
connects to our extension, it MUST be prepared for what would normally
be non-compliant responses to NBD_CMD_BLOCK_STATUS", then we are free to
do so (it is our extension, after all).  Particularly since the only way
I could actually trigger it was with blkdebug (most format layers
support byte-level access, even when layered on top of a protocol layer
with a 512 or 4k minimum byte access).

So if you think it is better for me to respin the patch to fix ONLY
base:allocation bugs, but not qemu:allocation-depth, and instead merely
document the issue there, I could be persuaded to do so.

> So, finally:
> 1. making bitmap export extents aligned is a separate simple thing -
> that's OK
> 2. making base:allocation aligned is possible due to good NBD_STATE_HOLE
> definition. So for it it's correct to combine BDRV_BLOCK_ALLOCATED as
> you do even keeping in mind format layers. We call block_status_above
> for the whole chain. ALLOCATED=0 only if all format layers refer to
> backing and bottom protocol driver(if exists) reports "unallocated in
> FS" so that correspond to
> "If an extent is marked with NBD_STATE_HOLE at that context, this means
> that the given extent is not allocated in the backend storage, and that
> writing to the extent MAY result in the NBD_ENOSPC error"
>    And this way, I'd prefer to fix exactly base:allocation context
> handling in nbd-server not touching generic block_status which already
> has enough mess.
> 3. fixing qemu:allocation-depth I think is impossible without allowing
> unaligned chunks in NBD spec (really, why we should restrict all
> possible metadata contexts so hard?) Or, if we are still going to align
> allocation-depth results to top layer request-alignment we should change
> allocation-depth specification so that that's not "actual allocation
> depth" but something >= than actual allocation depth of all subchunks...
> And that becomes useless.

Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

reply via email to

[Prev in Thread] Current Thread [Next in Thread]