qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v7 17/19] replay: add BH oneshot event for block


From: Pavel Dovgalyuk
Subject: Re: [Qemu-devel] [PATCH v7 17/19] replay: add BH oneshot event for block layer
Date: Fri, 30 Nov 2018 14:26:55 +0300

> From: Kevin Wolf [mailto:address@hidden
> Am 30.11.2018 um 09:21 hat Pavel Dovgalyuk geschrieben:
> > > From: Kevin Wolf [mailto:address@hidden
> > > Am 10.10.2018 um 15:35 hat Pavel Dovgalyuk geschrieben:
> > > > Replay is capable of recording normal BH events, but sometimes
> > > > there are single use callbacks scheduled with aio_bh_schedule_oneshot
> > > > function. This patch enables recording and replaying such callbacks.
> > > > Block layer uses these events for calling the completion function.
> > > > Replaying these calls makes the execution deterministic.
> > > >
> > > > Signed-off-by: Pavel Dovgalyuk <address@hidden>
> > >
> > > What are the rules for using aio_bh_schedule_oneshot() vs.
> > > replay_bh_schedule_oneshot_event()? You are modifying one place, but if
> > > I grep for aio_bh_schedule_oneshot(), I find many other places that
> > > use it. Why is this one place special?
> >
> > This one is special, because it caused a record/replay bug.  I can't
> > validate all other places, because I don't understand block layer good
> > enough.
> >
> > That's why our current strategy is fixing replay, when it is breaks.
> > It's too hard to create the test for verifying the modification. And
> > for the current patch there was the bug which was fixed.
> 
> So nobody really understands the code, but we're just fixing symptoms
> whenever something crashes, without ever thinking about the design? And
> then the code will organically grow to maybe approximate what we wanted
> it to do?
> 
> Honestly, that's not the way to build reliable and maintainable
> software.
> 
> > The rule is the following: when the event is asynchronous and its
> > finalization affects the guest state, then this event should be
> > processed by the record/replay queue.
> 
> Are there any BHs that can never affect the guest state?

For example, qemu_bh_schedule(qmp_dispatcher_bh) in handle_qmp_command.
thread-pool.c also uses BH callbacks, but they are for creating threads,
and not for modifying the replayed guest state.

> Maybe you should really intercept this inside qemu_bh_schedule() and
> aio_bh_schedule_oneshot() to catch all instances. This would look more
> likely to address the root cause rather than twenty different special
> cases and forgetting the other hundred instances.
> 
> But why do you even need to record and replay BHs? Aren't they already
> fully deterministic if the code that schedules them is deterministic? Is
> this hinting at problems in a different part of the code, so that the
> caller isn't deterministic even though we expect it to be?

BHs may be invoked at random moments, that are not related to the executed
instructions. As we replay CPU, memory, and peripheral device state, we
must replay all invocations all BHs that affect them.

Replay must be done precisely, with instruction-level granularity.
That is why we record the moment of BH invocation for matching it in
the replay phase.

Pavel Dovgalyuk




reply via email to

[Prev in Thread] Current Thread [Next in Thread]