[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 1/2] block/file-posix: Unaligned O_DIRECT block-
From: |
Eric Blake |
Subject: |
Re: [Qemu-devel] [PATCH 1/2] block/file-posix: Unaligned O_DIRECT block-status |
Date: |
Tue, 14 May 2019 16:50:39 -0500 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 |
On 5/14/19 4:42 PM, Max Reitz wrote:
> Currently, qemu crashes whenever someone queries the block status of an
> unaligned image tail of an O_DIRECT image:
> $ echo > foo
> $ qemu-img map --image-opts driver=file,filename=foo,cache.direct=on
> Offset Length Mapped to File
> qemu-img: block/io.c:2093: bdrv_co_block_status: Assertion `*pnum &&
> QEMU_IS_ALIGNED(*pnum, align) && align > offset - aligned_offset'
> failed.
>
> This is because bdrv_co_block_status() checks that the result returned
> by the driver's implementation is aligned to the request_alignment, but
> file-posix can fail to do so, which is actually mentioned in a comment
> there: "[...] possibly including a partial sector at EOF".
>
> Fix this by rounding up those partial sectors.
>
> There are two possible alternative fixes:
> (1) We could refuse to open unaligned image files with O_DIRECT
> altogether. That sounds reasonable until you realize that qcow2
> does necessarily not fill up its metadata clusters, and that nobody
> runs qemu-img create with O_DIRECT. Therefore, unpreallocated qcow2
> files usually have an unaligned image tail.
Yep, non-starter.
>
> (2) bdrv_co_block_status() could ignore unaligned tails. It actually
> throws away everything past the EOF already, so that sounds
> reasonable.
> Unfortunately, the block layer knows file lengths only with a
> granularity of BDRV_SECTOR_SIZE, so bdrv_co_block_status() usually
> would have to guess whether its file length information is inexact
> or whether the driver is broken.
Well, if I ever get around to my thread of making the block layer honor
byte-accurate sizes, instead of rounding up, then there is no longer
than inexactness. I think our mails crossed, and you missed another idea
of mine of having block drivers (probably only file-posix, per your
audit) set BDRV_BLOCK_EOF when returning an unaligned answer due to EOF,
as the key for letting the block layer know whether the unaligned answer
was due to size rounding.
>
> Fixing what raw_co_block_status() returns is the safest thing to do.
Agree.
>
> There seems to be no other block driver that sets request_alignment and
> does not make sure that it always returns aligned values.
Thanks for auditing.
>
> Cc: address@hidden
> Signed-off-by: Max Reitz <address@hidden>
> ---
> block/file-posix.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index e09e15bbf8..f489a5420c 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -2488,6 +2488,9 @@ static int coroutine_fn
> raw_co_block_status(BlockDriverState *bs,
> off_t data = 0, hole = 0;
> int ret;
>
> + assert(QEMU_IS_ALIGNED(offset, bs->bl.request_alignment) &&
> + QEMU_IS_ALIGNED(bytes, bs->bl.request_alignment));
> +
Can write in one line as:
assert(QEMU_IS_ALIGNED(offset | bytes, bs->bl.request_alignment));
> ret = fd_open(bs);
> if (ret < 0) {
> return ret;
> @@ -2513,6 +2516,20 @@ static int coroutine_fn
> raw_co_block_status(BlockDriverState *bs,
> /* On a data extent, compute bytes to the end of the extent,
> * possibly including a partial sector at EOF. */
> *pnum = MIN(bytes, hole - offset);
> +
> + /*
> + * We are not allowed to return partial sectors, though, so
> + * round up if necessary.
> + */
> + if (!QEMU_IS_ALIGNED(*pnum, bs->bl.request_alignment)) {
> + int64_t file_length = raw_getlength(bs);
> + if (file_length > 0) {
> + /* Ignore errors, this is just a safeguard */
> + assert(hole == file_length);
> + }
> + *pnum = ROUND_UP(*pnum, bs->bl.request_alignment);
> + }
Reviewed-by: Eric Blake <address@hidden>
bl.request_alignment is normally 1 (making this a no-op), but is
definitely larger for O_DIRECT images (where rounding up and treating
the post-EOF hole the same as the rest of the sector is the same thing
that NBD chose to do).
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature