[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 3/3] block: fail on open when file size is unaligned to reque

From: Max Reitz
Subject: Re: [PATCH 3/3] block: fail on open when file size is unaligned to request_alignment
Date: Mon, 23 Mar 2020 14:00:31 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0

On 12.03.20 13:06, Vladimir Sementsov-Ogievskiy wrote:
> 12.03.2020 14:59, Vladimir Sementsov-Ogievskiy wrote:
>> 11.03.2020 14:06, Max Reitz wrote:
>>> On 30.01.20 16:22, Vladimir Sementsov-Ogievskiy wrote:
>>>> Prior to the commit the following command lead to crash:
>>>>    ./qemu-io --image-opts -c 'write 0 512' \
>>>>    driver=blkdebug,align=4096,image.driver=null-co,image.size=512
>>>> It failes on assertion in bdrv_aligned_pwritev:
>>>>    "end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE"
>>>> The problem is obvious: 512 is aligned to 4096 and becomes larger than
>>>> file size. And the core bad thing is that file size is unaligned to
>>>> request_alignment.
>>>> Let's catch such case on bdrv_open_driver and fail.
>>> I think we had a discussion on this before, but I can’t find it right
>>> now.  (Although I think that had more to do with something in the
>>> file-posix driver, because it wasn’t limited to alignments above 512.)
>>> In any case, the file itself is totally valid.  Most importantly, qcow2
>>> will regularly create files with unaligned file lengths.
>>> So let me create a qcow2 image on a 4k-aligned device:
>>> $ truncate 512M fs.img

Oops, “truncate --size=512M fs.img”, of course.

>>> $ sudo losetup -f --show -b 4096 fs.img
>>> /dev/loop0
>>> $ sudo mkfs.ext4 /dev/loop0
>>> [...]
>>> $ sudo mount /dev/loop0 /mnt/tmp
>>> $ sudo ./qemu-img create -f qcow2 /mnt/tmp/foo.qcow2 64M
>>> Formatting '/mnt/tmp/foo.qcow2', fmt=qcow2 size=67108864
>>> cluster_size=65536 lazy_refcounts=off refcount_bits=16
>>> $ sudo ./qemu-io -t none -c quit /mnt/tmp/foo.qcow2
>>> qemu-io: can't open device /mnt/tmp/foo.qcow2: File size is unaligned to
>>> request alignment
>>> Which is too bad.
>> What exactly is bad?

That with this patch you can no longer create a qcow2 image on a 4k
block size medium and then open it with O_DIRECT.

>> Is it correct that create succeeded? Without new error, how would
>> qcow2 driver
>> read from unaligned tail of file-posix? It will crash, isn't it?

Well, the above test works, which is better already.

Basically part of the problem seems to be changing from unsafe caching
(during create) to O_DIRECT.  If we used O_DIRECT all the time, the
image would always be aligned to the block alignment.

Now that makes me wonder how such cases are handled right now.  Do we crash?

So I tried to create an image with 512 byte clusters, write to it with
normal WB caching, thus creating an image that isn’t aligned to 4k:

$ sudo ./qemu-img create -f qcow2 -o cluster_size=512 \
      /mnt/tmp/foo.qcow2 64M
$ sudo ./qemu-io -c 'write -P 42 0 512' /mnt/tmp/foo.qcow2
$ printf '%x\n' $(stat -c '%s' /mnt/tmp/foo.qcow2)

(So not aligned to 4k)

$ sudo ./qemu-io -t none -c 'read -P 42 0 512' /mnt/tmp/foo.qcow2
read 512/512 bytes at offset 0
512 bytes, 1 ops; 00.00 sec (4.567 MiB/sec and 9352.6997 ops/sec)

So actually, that works just fine right now.

When I let file-posix print all pread() calls, this is what I can see:

pread(9, buf_ofs=+0, file_ofs=0x4000, size=0x1000)
-> 2560
pread(9, buf_ofs=+0xa00, file_ofs=0x4a00, size=0x600)
-> 0

So all reads are increased to the alignment (4k), and to read the data
cluster, we read all 4k from 0x4000 (even though it actually is just 512
bytes at 0x4800).  This succeeds, even though just as a partial read
(2560 == 0xa00), so we try to read the rest of the 4k block, but that
doesn’t work (it’s at the EOF), and so we stop reading.

handle_aiocb_rw() then zeroes out the rest of the buffer (below the
“out” label) and everyone’s happy.

So to me it looks like it works perfectly fine right now.  No crashes.

> Hmm, it crashes only on write-part if don't have RESIZE permission.. So
> for qcow2
> everything is OK.
> And generic read don't care about reading past-EOF because of alignment,
> read is passed
> to driver.

(And only after the test, I realize that you apparently answered the
question yourself...  Oops.)

>>> So the real solution would probably...  Be to align the file size up to
>>> the alignment?
>> On creation, you mean?

Yes.  But I suspect we’d need to do it all the time, because see my
above example: The user might access an image with WB caching at one
point, and then with O_DIRECT the next time.

But I can see two problems:

First, we don’t even know what the block alignment is.  It’s hard enough
to figure it out for O_DIRECT images, I’m not sure we can reliably do so
for WB-cached images.

Second, this wouldn’t help with pre-existing images.  And note that
“pre-existing” might mean an image a user creates on an FS with 512 byte
blocks and then moves it to another with 4k blocks.

So I don’t think aligning the file size up would work, actually.  Except
if we decided to always align it up to 4k, but then we’d still have a
problem with older pre-existing images...


>>> Max
>>>> Note, that file size and request_alignment may become out of sync
>>>> later, so this commit is not full fix of the problem, but it's better
>>>> than nothing.
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
>>>> ---
>>>>   block.c | 7 +++++++
>>>>   1 file changed, 7 insertions(+)
>>>> diff --git a/block.c b/block.c
>>>> index ecd09dbbfd..4cfc6c33a2 100644
>>>> --- a/block.c
>>>> +++ b/block.c
>>>> @@ -1324,6 +1324,13 @@ static int bdrv_open_driver(BlockDriverState
>>>> *bs, BlockDriver *drv,
>>>>       assert(bdrv_min_mem_align(bs) != 0);
>>>>       assert(is_power_of_2(bs->bl.request_alignment));
>>>> +    if (bs->bl.request_alignment > 512 &&
>>>> +        !QEMU_IS_ALIGNED(bs->total_sectors,
>>>> bs->bl.request_alignment / 512))
>>>> +    {
>>>> +        error_setg(errp, "File size is unaligned to request
>>>> alignment");
>>>> +        return -EINVAL;
>>>> +    }
>>>> +
>>>>       for (i = 0; i < bs->quiesce_counter; i++) {
>>>>           if (drv->bdrv_co_drain_begin) {
>>>>               drv->bdrv_co_drain_begin(bs);

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]