|
From: | Hanna Reitz |
Subject: | Re: Inconsistent block status reply in qemu-nbd in qemu-img-6.2.0 |
Date: | Mon, 17 Jan 2022 14:01:03 +0100 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.3.0 |
On 17.01.22 12:38, Kevin Wolf wrote:
Am 17.01.2022 um 10:55 hat Hanna Reitz geschrieben:On 17.01.22 10:52, Kevin Wolf wrote:Am 17.01.2022 um 09:46 hat Hanna Reitz geschrieben:On 16.01.22 19:09, Nir Soffer wrote:On Sun, Jan 16, 2022 at 1:17 PM Nir Soffer <nsoffer@redhat.com> wrote:Some of our tests started to fail since qemu-img 6.2.0 released. Here is an example failure: $ truncate -s1g /data/scratch/empty-1g.raw $ qemu-nbd -r -t -e8 -k /tmp/nbd.sock -A -f raw /data/scratch/empty-1g.raw &git bisect points to this commit: $ git bisect good 0bc329fbb009f8601cec23bf2bc48ead0c5a5fa2 is the first bad commit commit 0bc329fbb009f8601cec23bf2bc48ead0c5a5fa2 Author: Hanna Reitz <hreitz@redhat.com> Date: Thu Aug 12 10:41:44 2021 +0200 block: block-status cache for data regions The commit message says: (Note that only caching data regions but not zero regions means that returning false information from the cache is not catastrophic: Treating zeroes as data is fine. While we try to invalidate the cache on zero writes and discards, such incongruences may still occur when there are other processes writing to the image.) I don't think it is fine to report zero as data. This can cause severe performance issues when users copy unneeded zero extents over the network, instead of doing no work with zero bandwidth.You’re right, it was meant as a contrast to whether this might lead to catastrophic data failure. But it was also meant as a risk estimation. There wasn’t supposed to be a situation where the cache would report zeroes as data (other than with external writers), and so that’s why I thought it’d be fine. But, well, things are supposed to be one way, and then you (me) write buggy code... Thanks for the reproducer steps, I can reproduce the bug with your script (not with nbdinfo fwiw) and the problem seems to be this: diff --git a/block/io.c b/block/io.c index bb0a254def..83e94faeaf 100644 --- a/block/io.c +++ b/block/io.c @@ -2498,7 +2498,8 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs, * the cache requires an RCU update, so double check here to avoid * such an update if possible. */ - if (ret == (BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID) && + if (want_zero && + ret == (BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID) && QLIST_EMPTY(&bs->children)) { /* We should update the cache only when we have accurate zero information, so only if want_zero was true.Either that, or store the want_zero value as well and compare it before using the cache. Depends on whether we need the optimisation for want_zero=false cases, too. I think this mostly means bdrv_is_allocated(_above)(), which seems to be relevant for the commit and stream jobs and 'qemu-img rebase' at least and some less commonly used stuff. There are some other cases of is_allocated (like in mirror) where it doesn't make a difference because we always process the maximum number of bytes with the first call.I think we need the cache only for want_zero=true cases, because allocation information is handled by the format layer, which doesn’t use the cache at all. Also, the reason for the cache was to skip slow SEEK_HOLE/DATA calls on the protocol level, which is a want_zero=true case.Hm, good point. But we would still call the protocol driver with want_zero=false for raw images. Obviously, this removes the commit case because it calls is_allocated only on the overlay and that can't be raw. However, streaming (or more specifically the copy-on-read driver) does seem to include the base image in its bdrv_is_allocated_above(), which may be raw. So it looks to me as if the optimisation were still relevant when you're streaming from a raw base image into a qcow2 overlay.
I don’t think it usually is, because AFAIR want_zero was introduced to speed up these cases where we only want allocation information, and the driver should deliver zero information only when it comes at no additional cost.
For most protocol drivers, there is no other information to query, so “at no additional cost” is the same as “at no cost”, which seems impossible and so I’d assume that most drivers just return everything as allocated with want_zero=false (file-posix does this), and so a cache won’t gain us anything. If drivers happen to be able to perform a miracle and really provide the information at no cost, then we also don’t need a cache.
NBD however is a protocol driver that does have other information to query, namely allocation information, and zero information falls out of there as a side effect. So there caching would help, I suppose. But then again we didn’t really add the cache to be able to skip some NBD requests...
Conceptually, I think want_zero and the BSC are two solutions to the same problem. We have introduced the BSC because we couldn’t make all relevant calls pass want_zero=false, and so the BSC is a solution specifically for want_zero=true.
I can’t rule out that under very specific circumstances we might get performance benefits from the BSC even in the want_zero=false case (when using the NBD block driver, that is), but... That’s just not the case I wrote the code for, and so I don’t feel comfortable using it for that case.
Hanna
[Prev in Thread] | Current Thread | [Next in Thread] |