qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] replay: synchronize on every virtual timer callback


From: Pavel Dovgalyuk
Subject: Re: [PATCH] replay: synchronize on every virtual timer callback
Date: Tue, 19 May 2020 08:56:55 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0


On 18.05.2020 18:56, Alex Bennée wrote:
Philippe Mathieu-Daudé <address@hidden> writes:

+ Alex

On 5/6/20 10:17 AM, Pavel Dovgalyuk wrote:
Sometimes virtual timer callbacks depend on order
of virtual timer processing and warping of virtual clock.
Therefore every callback should be logged to make replay deterministic.
This patch creates a checkpoint before every virtual timer callback.
With these checkpoints virtual timers processing and clock warping
events order is completely deterministic.
Signed-off-by: Pavel Dovgalyuk <address@hidden>
---
   util/qemu-timer.c |    5 +++++
   1 file changed, 5 insertions(+)
diff --git a/util/qemu-timer.c b/util/qemu-timer.c
index d548d3c1ad..47833f338f 100644
--- a/util/qemu-timer.c
+++ b/util/qemu-timer.c
@@ -588,6 +588,11 @@ bool timerlist_run_timers(QEMUTimerList *timer_list)
           qemu_mutex_lock(&timer_list->active_timers_lock);
             progress = true;
+        /*
+         * Callback may insert new checkpoints, therefore add new checkpoint
+         * for the virtual timers.
+         */
+        need_replay_checkpoint = timer_list->clock->type == QEMU_CLOCK_VIRTUAL;
       }
       qemu_mutex_unlock(&timer_list->active_timers_lock);
So the problem I have with this as with all the record/replay stuff I
need want to review is it's very hard to see things in action. I added a
*very* basic record/replay test to the aarch64 softmmu tests but they
won't exercise any of this code because no timers get fired. I'm
assuming the sort of tests that is really needed is something that not
only causes QEMU_CLOCK_VIRTUAL timers to fire and trigger logged HW
events and ensure that things don't get confused in the process.

I encounter most of the bugs in different OS boot scenarios.

We also have internal tests that include some computational, disk, and network interaction tasks.

Is it possible to add a test like booting a "real" OS and replaying it?


If I read up the file I just get more questions than answers. For
example why do we release the qemu_timers lock before processing the
replay event? Is it that the replay event could cause another timer to

We release the lock, because accessing the replay module may process some events and add more timers.

be consumed? That seems suspect to me given we should only be expiring
times in the run loop.

Could the code be re-factored to use QEMU_LOCK_GUARD? It's hard to know
and I really wouldn't want to try that re-factoring without some sort of
confidence we were properly exercising the semantics of record/replay
and alive to potential regressions.

QEMU_LOCK_GUARD looks nice. But we'll still need unlock/lock pairs around checkpoint and timer callback.


Please realise I do like the concept of record/replay and I'd love to
get more features merged (like for example the reverse debug patches).
However by it's very nature it gets it's fingers deeply intertwined with
the main run loop and we really need to better exercise the code in our
tests.

FWIW you can have an:

Acked-by: Alex Bennée <address@hidden>

Thanks.


which means it doesn't look obviously broken to me and it doesn't seem
to break the non-record/replay cases because that's all I can really
test.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]