qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above
Date: Wed, 20 Nov 2019 10:20:10 +0000

16.11.2019 19:34, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> I wanted to understand, what is the real difference between 
> bdrv_block_status_above
> and bdrv_is_allocated_above, IMHO bdrv_is_allocated_above should work through
> bdrv_block_status_above..
> 
> And I found the problem: bdrv_is_allocated_above considers space after EOF as
> UNALLOCATED for intermediate nodes..
> 
> UNALLOCATED is not about allocation at fs level, but about should we go to 
> backing or
> not.. And it seems incorrect for me, as in case of short backing file, we'll 
> read
> zeroes after EOF, instead of going further by backing chain.
> 
> This leads to the following effect:
> 
> ./qemu-img create -f qcow2 base.qcow2 2M
> ./qemu-io -c "write -P 0x1 0 2M" base.qcow2
> 
> ./qemu-img create -f qcow2 -b base.qcow2 mid.qcow2 1M
> ./qemu-img create -f qcow2 -b mid.qcow2 top.qcow2 2M
> 
> Region 1M..2M is shadowed by short middle image, so guest sees zeroes:
> ./qemu-io -c "read -P 0 1M 1M" top.qcow2
> read 1048576/1048576 bytes at offset 1048576
> 1 MiB, 1 ops; 00.00 sec (22.795 GiB/sec and 23341.5807 ops/sec)
> 
> But after commit guest visible state is changed, which seems wrong for me:
> ./qemu-img commit top.qcow2 -b mid.qcow2
> 
> ./qemu-io -c "read -P 0 1M 1M" mid.qcow2
> Pattern verification failed at offset 1048576, 1048576 bytes
> read 1048576/1048576 bytes at offset 1048576
> 1 MiB, 1 ops; 00.00 sec (4.981 GiB/sec and 5100.4794 ops/sec)
> 
> ./qemu-io -c "read -P 1 1M 1M" mid.qcow2
> read 1048576/1048576 bytes at offset 1048576
> 1 MiB, 1 ops; 00.00 sec (3.365 GiB/sec and 3446.1606 ops/sec)
> 
> 
> I don't know, is it a real bug, as I don't know, do we support backing file 
> larger than
> its parent. Still, I'm not sure that this behavior of bdrv_is_allocated_above 
> don't lead
> to other problems.
> 
> =====
> 
> Hmm, bdrv_block_allocated_above behaves strange too:
> 
> with want_zero=true, it may report unallocated zeroes because of short 
> backing files, which
> are actually "allocated" in POV of backing chains. But I see this may 
> influence only
> qemu-img compare, and I don't see can it trigger some bug..
> 
> with want_zero=false, it may do no progress because of short backing file. 
> Moreover it may
> report EOF in the middle!! But want_zero=false used only in 
> bdrv_is_allocated, which considers
> onlyt top layer, so it seems OK.
> 
> =====
> 
> So, I propose these series, still I'm not sure is there a real bug.
> 
> Vladimir Sementsov-Ogievskiy (4):
>    block/io: fix bdrv_co_block_status_above
>    block/io: bdrv_common_block_status_above: support include_base
>    block/io: bdrv_common_block_status_above: support bs == base
>    block/io: fix bdrv_is_allocated_above
> 
>   block/io.c                 | 104 ++++++++++++++++++-------------------
>   tests/qemu-iotests/154.out |   4 +-
>   2 files changed, 53 insertions(+), 55 deletions(-)
> 


Interesting that the problem illustrated here is not fixed by the series, it's 
actually
relates to the fact that mirror does truncation with PREALLOC_MODE_OFF, which 
leads
to unallocated qcow2 clusters, which I think should be fixed too.

To illustrate the problem fixed by the series, we should commit to base:

# ./qemu-img commit top.qcow2 -b base.qcow2
Image committed.
# ./qemu-io -c "read -P 0 1M 1M" base.qcow2
Pattern verification failed at offset 1048576, 1048576 bytes
read 1048576/1048576 bytes at offset 1048576
1 MiB, 1 ops; 00.00 sec (5.366 GiB/sec and 5494.4149 ops/sec)

=======

Hmm, but how to fix the problem about truncate? I think truncate must not make
underlying backing available for read.. Discard operation doesn't do it.

So, actually on PREALLOC_MODE_OFF we must allocated L2 tables and mark new 
clusters
as ZERO?

To improve it, we can add "zero bit" to L1 table entry..

-- 
Best regards,
Vladimir

reply via email to

[Prev in Thread] Current Thread [Next in Thread]