qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7 17/19] replay: add BH oneshot event for block


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH v7 17/19] replay: add BH oneshot event for block layer
Date: Fri, 30 Nov 2018 12:18:08 +0100
User-agent: Mutt/1.10.1 (2018-07-13)

Am 30.11.2018 um 09:21 hat Pavel Dovgalyuk geschrieben:
> > From: Kevin Wolf [mailto:address@hidden
> > Am 10.10.2018 um 15:35 hat Pavel Dovgalyuk geschrieben:
> > > Replay is capable of recording normal BH events, but sometimes
> > > there are single use callbacks scheduled with aio_bh_schedule_oneshot
> > > function. This patch enables recording and replaying such callbacks.
> > > Block layer uses these events for calling the completion function.
> > > Replaying these calls makes the execution deterministic.
> > >
> > > Signed-off-by: Pavel Dovgalyuk <address@hidden>
> > 
> > What are the rules for using aio_bh_schedule_oneshot() vs.
> > replay_bh_schedule_oneshot_event()? You are modifying one place, but if
> > I grep for aio_bh_schedule_oneshot(), I find many other places that
> > use it. Why is this one place special?
> 
> This one is special, because it caused a record/replay bug.  I can't
> validate all other places, because I don't understand block layer good
> enough.
>
> That's why our current strategy is fixing replay, when it is breaks.
> It's too hard to create the test for verifying the modification. And
> for the current patch there was the bug which was fixed.

So nobody really understands the code, but we're just fixing symptoms
whenever something crashes, without ever thinking about the design? And
then the code will organically grow to maybe approximate what we wanted
it to do?

Honestly, that's not the way to build reliable and maintainable
software.

> The rule is the following: when the event is asynchronous and its
> finalization affects the guest state, then this event should be
> processed by the record/replay queue.

Are there any BHs that can never affect the guest state?

Maybe you should really intercept this inside qemu_bh_schedule() and
aio_bh_schedule_oneshot() to catch all instances. This would look more
likely to address the root cause rather than twenty different special
cases and forgetting the other hundred instances.

But why do you even need to record and replay BHs? Aren't they already
fully deterministic if the code that schedules them is deterministic? Is
this hinting at problems in a different part of the code, so that the
caller isn't deterministic even though we expect it to be?

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]