qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-devel] [PATCH 0/2] deal with BDRV_BLOCK_RAW
Date: Tue, 13 Aug 2019 11:14:13 +0000

13.08.2019 12:33, Vladimir Sementsov-Ogievskiy wrote:
> 13.08.2019 12:01, Vladimir Sementsov-Ogievskiy wrote:
>> 13.08.2019 11:39, Vladimir Sementsov-Ogievskiy wrote:
>>> 12.08.2019 22:50, Max Reitz wrote:
>>>> On 12.08.19 21:46, Max Reitz wrote:
>>>>> On 12.08.19 20:11, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> Hi all!
>>>>>>
>>>>>> I'm not sure, is it a bug or a feature, but using qcow2 under raw is
>>>>>> broken. It should be either fixed like I propose (by Max's suggestion)
>>>>>> or somehow forbidden (just forbid backing-file supporting node to be
>>>>>> file child of raw-format node).
>>>>>
>>>>> I agree, I think only filters should return BDRV_BLOCK_RAW.
>>>>>
>>>>> (And not even them, they should just be handled transparently by
>>>>> bdrv_co_block_status().  But that’s something for later.)
>>>>>
>>>>>> Vladimir Sementsov-Ogievskiy (2):
>>>>>>    block/raw-format: switch to BDRV_BLOCK_DATA with BDRV_BLOCK_RECURSE
>>>>>>    iotests: test mirroring qcow2 under raw format
>>>>>>
>>>>>>   block/raw-format.c         |  2 +-
>>>>>>   tests/qemu-iotests/263     | 46 ++++++++++++++++++++++++++++++++++++++
>>>>>>   tests/qemu-iotests/263.out | 12 ++++++++++
>>>>>>   tests/qemu-iotests/group   |  1 +
>>>>>>   4 files changed, 60 insertions(+), 1 deletion(-)
>>>>>>   create mode 100755 tests/qemu-iotests/263
>>>>>>   create mode 100644 tests/qemu-iotests/263.out
>>>>>
>>>>> Thanks, applied to my block-next branch:
>>>>>
>>>>> https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next
>>>>
>>>> Oops, maybe not.  221 needs to be adjusted.
>>>>
>>>
>>>
>>> Hmm yes, I forget to run tests.. Areas which were zero becomes data|zero, it
>>> don't look good.
>>>
>>> So, it's not quite right to report DATA | RECURSE, we actually should report
>>> DATA_OR_ZERO | RECURSE, which is actually ALLOCATED | RECURSE, as otherwise
>>> DATA will be set in final result (generic layer must not drop it, 
>>> obviously).
>>>
>>> ALLOCATED never returned by drivers but seems it should be. I'll think a 
>>> bit and
>>> resend something new.
>>>
>>>
>>
>>
>> Hmmm.. So, we have raw node, and assume backing chain under it. who should 
>> loop through it,
>> generic code or raw driver?
>>
>> Now it all looks like generic code is responsible for looping through 
>> filtered chain (backing files
>> and filters) and driver is responsible for all it's children except for 
>> filtered child.
>>
>> Or, driver may return something that says to generic child to handle the 
>> whole backing chain of returned
>> file at once, as it's another backing chain. And seems even RECURSE don't 
>> work correctly as it doesn't handle
>> the backing chain in this recursion. Why it works better than RAW - just 
>> because we return it together
>> with DATA flags and this DATA flag is kept anyway, independently of finding 
>> zeros or not.
>>
>>
> 
> 
> Hmm, so, is it correct that we return DATA | RECURSE, if we are not really 
> sure that it is data?
> 
> If we see at
> 
>   * BDRV_BLOCK_DATA: allocation for data at offset is tied to this layer
> 
> seems like we should report DATA only if there is allocation..
> 
>   * DATA ZERO OFFSET_VALID
>   *  t    t        t       sectors read as zero, returned file is zero at 
> offset
>   *  t    f        t       sectors read as valid from file at offset
>   *  f    t        t       sectors preallocated, read as zero, returned file 
> not
> 
> so, ZERO alone doesn't guarantee that we may safely read?
> 
> So, for qcow2 metadata-preallocated images, what about zero-init? We report 
> DATA, and probably get ZERO from
> file and have finally DAYA | ZERO which guarantees that read will return 
> zeros, but is it true?
> 
> Finally, what "DATA" mean? That space is allocated and occupies disk space? 
> Or it only  means only ALLOCATED i.e.
> "read from this layer, not from backing" otherwise, and adds additional 
> meaning to ZERO when used together, that
> read will return zeros for sure?
> 


Continue self-discussion.

Consider closer the following case:
 >   * DATA ZERO OFFSET_VALID
 >   *  f    t        t       sectors preallocated, read as zero, returned file 
 > not

It actually means that we must not read, as read will return wrong data, when 
clusters are actually zero for guest.

It's OK, when for ex. qcow2 returns this combination and link to its file 
child: it means that if you read from qcow2
node, you'll see correct zeros, as qcow2 has special metadata which shows that 
these clusters are zero. But if you read
from file directly at returned offset you'll see garbage, so don't do it.

But what if some node returns this combination with file == itself? It actually 
means that you must not read, but you
should call block-status to understand that there are zeros. So, if some format 
can return this combination with
file == itself it means that you must not read it directly, but only after 
checking block status.

And file-posix is example of such driver. But file-posix holes are guaranteed 
to read as zero, so we can report DATA | ZERO.
But this will break user expirience which assumes that DATA means occupation of 
real disk space.

...
me go and re-read what we've documented in NBD protocol about block steus...

"DATA" turns into NBD_STATE_HOLE, which formally means nothing, and just notes 
that probably there is no disk space occupation
for this region.. So it's about disk space allocation and nothing about 
correctness of read.
and NBD_STATE_ZERO guarantees that region read as all zeroes.

Look at code in nbd/server.c.. Aha, it calls block_status_above and turns 
!ALLOCATED into HOLE. Which means that it will never
return HOLE for file-posix..





-- 
Best regards,
Vladimir

reply via email to

[Prev in Thread] Current Thread [Next in Thread]