[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs

From: Paolo Bonzini
Subject: Re: [Qemu-devel] nbd: Possible regression in 2.9 RCs
Date: Wed, 5 Apr 2017 23:13:25 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0

On 05/04/2017 13:01, Kevin Wolf wrote:
> Am 04.04.2017 um 17:09 hat Paolo Bonzini geschrieben:
>> On 04/04/2017 16:53, Kevin Wolf wrote:
>>>> The big question is how this fits into release management.  We have
>>>> another important regression from the op blocker work and only a week
>>>> to go before the last rc.  Are we going to delay 2.9 arbitrarily?  Are
>>>> we going to shorten the 2.10 development period correspondingly?  (I
>>>> vote yes and yes, FWIW).
>>> Which is the other regression?
>> The assertion failure for snapshot_blkdev with iothreads.
> Ah, right, I keep forgetting that this started appearing with the op
> blocker series because the failure mode is completely different, so it
> seems to have been a latent bug somewhere else that was uncovered by it.
> If we're sure that the change of the order in bdrv_append() is what
> caused the bug to appear, we can just undo that for 2.9, at the cost of
> a messed up graph in the error case when bdrv_set_backing_hd() fails
> (because we have no way to undo bdrv_replace_node()).

I don't know if that is enough to fix all of the issues, but the bug is
easy to reproduce.

The issue is the lack of understanding of what node movement does to
quiesce_counter.  The invariant is that children cannot have a lower
quiesce_counter than parents, I think (paths in the graph can only join
in the children direction, right?).  Is it checked, and are there
violations already?  Maybe we need a get_quiesce_counter method in
BdrvChildRole, to cover BlockBackend's quiesce_counter?  Then we can use
that information to adjust the quiesce_counter when nodes move in the graph.

The block layer has good tests, but as the internal logic grows more
complex we should probably have more C level tests.  I'm constantly
impressed by the amount of tricky cases that test-replication.c catches
in the block job code.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]