qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/4] qemu-img: Fix exit code for errors closing the image


From: Markus Armbruster
Subject: Re: [PATCH 0/4] qemu-img: Fix exit code for errors closing the image
Date: Wed, 15 Feb 2023 14:07:57 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)

Kevin Wolf <kwolf@redhat.com> writes:

> Am 13.01.2023 um 08:30 hat Markus Armbruster geschrieben:
>> Drive-by comment...
>> 
>> Kevin Wolf <kwolf@redhat.com> writes:
>> 
>> > This series addresses the problem described in these bug reports:
>> > https://gitlab.com/qemu-project/qemu/-/issues/1330
>> > https://bugzilla.redhat.com/show_bug.cgi?id=2147617
>> >
>> > qcow2 can fail when writing back dirty bitmaps in qcow2_inactivate().
>> > However, when the function is called through blk_unref(), in the case of
>> > such errors, while an error message is written to stderr, the callers
>> > never see an error return. Specifically, 'qemu-img bitmap/commit' are
>> > reported to exit with an exit code 0 despite the errors.
>> 
>> After having tead the "potential alternative" below, I figure this
>> failure happens within blk_unref().  But I can't see a call chain.  Am I
>> confused?
>
> When I put an abort() into the error path:
>
> #0  0x00007ffff6aa156c in __pthread_kill_implementation () from 
> /lib64/libc.so.6
> #1  0x00007ffff6a54d76 in raise () from /lib64/libc.so.6
> #2  0x00007ffff6a287f3 in abort () from /lib64/libc.so.6
> #3  0x00005555556108f3 in qcow2_inactivate (bs=0x555555879a30) at 
> ../block/qcow2.c:2705
> #4  0x0000555555610a08 in qcow2_do_close (bs=0x555555879a30, 
> close_data_file=true) at ../block/qcow2.c:2741
> #5  0x0000555555610b38 in qcow2_close (bs=0x555555879a30) at 
> ../block/qcow2.c:2770
> #6  0x00005555555a1b4e in bdrv_close (bs=0x555555879a30) at ../block.c:4939
> #7  0x00005555555a2ad4 in bdrv_delete (bs=0x555555879a30) at ../block.c:5330
> #8  0x00005555555a5b49 in bdrv_unref (bs=0x555555879a30) at ../block.c:6850
> #9  0x000055555559d6c5 in bdrv_root_unref_child (child=0x555555873300) at 
> ../block.c:3207
> #10 0x00005555555c7beb in blk_remove_bs (blk=0x5555558796e0) at 
> ../block/block-backend.c:895
> #11 0x00005555555c6c3f in blk_delete (blk=0x5555558796e0) at 
> ../block/block-backend.c:479
> #12 0x00005555555c6fb0 in blk_unref (blk=0x5555558796e0) at 
> ../block/block-backend.c:537
> #13 0x0000555555587dc9 in img_bitmap (argc=7, argv=0x7fffffffd760) at 
> ../qemu-img.c:4820
> #14 0x0000555555589807 in main (argc=7, argv=0x7fffffffd760) at 
> ../qemu-img.c:5450

Thanks!

>> > The solution taken here is inactivating the images first, which can
>> > still return errors, but already performs all of the write operations.
>> > Only then the images are actually blk_unref()-ed.
>> >
>> > If we agree that this is the way to go (as a potential alternative,
>> > allowing blk_unref() to fail would require changes in all kinds of
>> > places, many of which probably wouldn't even know what to do with the
>> > error),
>> 
>> blk_unref() could fail only when it destroys @blk (refcnt goes to zero).
>> Correct?
>
> I think so, yes.

Thanks again!

>> We have a bunch of "unref" functions in the tree, and, as far as I can
>> tell from a quick grep, none of them can fail.  Supports your apparent
>> preference for not changing blk_unref().
>> 
>> >         then I suppose doing the same for other qemu-img subcommands
>> > would make sense, too.
>> 
>> I was about to ask whether there could be more silent failures like the
>> ones in commit and bitmap.  This suggests there are.
>> 
>> Say we do the same for all known such failures.  Would any remaining (or
>> new) such failures be programming errors?
>
> Let's be honest: What I'm proposing here is not pretty and not a full
> solution, it only covers the simplest part of the problem, which happens
> to be the part that has shown up in practice.
>
> If you have a good idea how to solve the general problem, I'm all ears.
>
> I haven't checked other qemu-img subcommands, but I don't see why they
> wouldn't be able to run into an error in .bdrv_close. They could be
> fixed the same way.
>
> The next level in difficulty might be QMP block-delete. It's still easy
> because like in qemu-img, we know that we're freeing the last reference,
> and so we could actually do the same here. Well, more or less the same
> at least: Obviously not inactivate_all(), but just for a single node. We
> also need to do this recursively for children, except only for those
> that would actually go away together with our parent node and aren't
> referenced elsewhere. Even if we manage to implement this correctly,
> what do we do with the error? Would returning a QMP error imply that we
> didn't actually close the image and it's still valid (and not
> inactivated)?
>
> Too easy? Let's make it a bit harder. Let's say a commit job completes
> and we're now removing the intermediate nodes. One of these images could
> in theory fail in .bdrv_close. We have successfully committed the data,
> the new graph is ready and in good state. Just one of the old images
> we're throwing out runs into ENOSPC in its .bdrv_close. Where do we
> report that error? We don't even necessarily have a QMP command here, we
> could only let the whole block job fail, which is probably not a good
> way to let libvirt know what was happening. Also, we can't just
> unconditionally inactivate the image beforehand there, it might still be
> in use by other references.  Which may actually be dropped while we're
> draining the node in bdrv_close().
>
> Not enough headaches yet? There are plenty of places in QEMU that just
> want to make sure that the node doesn't go away while they are still
> doing something with it. So they use a bdrv_ref/unref pair locally.
> These places could end up freeing the last reference if the node would
> have gone away otherwise. They are almost certainly a very confusing
> place to report the error. They might not even be places that can return
> errors at all currently.

Yes.

> So the main reason why I'm not doing this properly by returning the
> errors from qcow2_close() (and .bdrv_close in all other drivers) through
> bdrv_unref() down to the callers of that is not only that it would be a
> major conversion that would touch lots of places, but also that I
> wouldn't even know what to do with the error in most callers. And that
> I'm not sure what the semantics of an error in a close function should
> be.

Understand.

> Another thing that could be tried is making failure in .bdrv_close less
> likely by doing things earlier. At least ENOSPC could probably be
> avoided if dirty bitmaps clusters were allocated during the write
> request that first sets a bit in them (I know too little about the
> details how bitmaps are implemented in qcow2, though, maybe Vladimir can
> help here). But ultimately, you'll always get some I/O requests in
> .bdrv_close and they could fail even if we made it less likely.

Let me try to summarize to make sure I understand.

Closing an image can fail for the same reason close() can fail: flushing
caches can fail, and not caching is not an option.

The close is commonly hidden within a bdrv_unref().  It closes when the
last reference goes away.

Sometimes we know which bdrv_unref() will close.  Sometimes we don't.

Some bdrv_unref() callers can report errors sanely.  Others simply
can't.

Some failures to close can be safely ignored, such as closing a
temporary image that is going away anyway.  But it's hard to tell when
this is the case.

Ideally, things fail cleanly: we either do what's asked and succeed, or
do nothing and fail.  A failure to close is commonly unclean.  So, even
if we can report it, recovery can be hard or impossible.


A common criticism of garbage collection is that finalization is delayed
and runs "out of context".  The above shows that reference counting
isn't all that better.

We could have two variants of bdrv_unref(), one that must not fail, and
one that can fail and must be checked.  But as you explained, ensuring
failure only happens in places where we can handle an error sanely is
somewhere between hard and impossible.

No better ideas, I'm afraid.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]