qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/repl


From: Pavel Dovgalyuk
Subject: Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay
Date: Wed, 10 Feb 2016 15:51:21 +0300

> From: Kevin Wolf [mailto:address@hidden
> Am 10.02.2016 um 13:05 hat Pavel Dovgalyuk geschrieben:
> > > Am 09.02.2016 um 12:52 hat Pavel Dovgalyuk geschrieben:
> > > > > From: Kevin Wolf [mailto:address@hidden
> > > > > But even this doesn't feel completely right, because block drivers are
> > > > > already layered and there is no need to hardcode something optional 
> > > > > (and
> > > > > rarely used) in the hot code path that could just be another layer.
> > > > >
> > > > > I assume that you know beforehand if you want to replay something, so
> > > > > requiring you to configure your block devices with a replay driver on
> > > > > top of the stack seems reasonable enough.
> > > >
> > > > I cannot use block drivers for this. When driver functions are used, 
> > > > QEMU
> > > > is already used coroutines (and probably started bottom halves).
> > > > Coroutines make execution non-deterministic.
> > > > That's why we have to intercept blk_aio_ functions, that are called
> > > > deterministically.
> > >
> > > What does "deterministic" mean in this context, i.e. what are your exact
> > > requirements?
> >
> > "Deterministic" means that the replayed execution should run exactly
> > the same guest instructions in the same sequence, as in recording session.
> 
> Okay. I think with this we can do better than what you have now.
> 
> > > I don't think that coroutines introduce anything non-deterministic per
> > > se. Depending on what you mean by it, the block layer code paths in
> > > block.c may contain problematic code.
> >
> > They are non-deterministic if we need instruction-level accuracy.
> > Thread switching (and therefore callbacks and BH execution) is 
> > non-deterministic.
> 
> Thread switching depends on an external event (the kernel scheduler
> deciding to switch), so agreed, if a thread switch ever influences what
> the guest sees, that would be a problem.
> 
> Generally, however, callbacks and BHs don't involve a thread switch at
> all (BHs can be invoked from a different thread in theory, but we have
> very few of those cases and they shouldn't be visible for the guest).
> The same is true for coroutines, which are semantically equivalent to
> callbacks.
> 
> > In two different executions these callbacks may happen at different moments 
> > of
> > time (counting in number of executed instructions).
> > All operations with virtual devices (including memory, interrupt controller,
> > and disk drive controller) should happen at deterministic moments of time
> > to be replayable.
> 
> Right, so let's talk about what this external non-deterministic event
> really is.
> 
> I think the only thing whose timing is unknown in the block layer is the
> completion of I/O requests. This non-determinism comes from the time the
> I/O syscalls made by the lowest layer (usually raw-posix) take.

Right.

> This means that we can add logic to remove the non-determinism at the
> point of our choice between raw-posix and the guest device emulation. A
> block driver on top is as good as anything else.
> 
> While recording, this block driver would just pass the request to next
> lower layer (starting a request is deterministic, so it doesn't need to
> be logged) and once the request completes it logs it. While replaying,
> the completion of requests is delayed until we read it in the log; if we
> read it in the log and the request hasn't completed yet, we do a busy
> wait for it (while(!completed) aio_poll();).

I tried serializing all bottom halves and worker thread callbacks in
previous version of the patches. That code was much more complicated 
and error-prone than the current version. We had to classify all bottom
halves to recorded and non-recorded (because sometimes they are used
for qemu's purposes, not the guest ones).

However, I don't understand yet which layer do you offer as the candidate
for record/replay? What functions should be changed?
I would like to investigate this way, but I don't got it yet.

> This model would get rid of the bdrv_drain_all() that you call
> everywhere and therefore allow concurrent requests, giving a result that
> is much closer to the "normal" behaviour without replay.
> 
> > > The block layer uses bottom halves in some cases for request completion,
> > > but not until calling into the first driver (why would they be a
> > > problem?). What could happen is that a request is serialised and
> > > therefore delayed in some special configurations, which sounds a lot
> > > like what you wanted to avoid.
> >
> > Drivers cannot distinguish the requests from guest CPU and from 
> > savevm/loadvm.
> > First ones have to be deterministic, because they affect guest memory,
> > virtual disk controller, and interrupts.
> 
> Sure they can, these are two different callbacks. But even if they
> couldn't, making more things than necessary deterministic might be
> wasteful, but not really harmful.

Is there any universal way to check this?

Pavel Dovgalyuk




reply via email to

[Prev in Thread] Current Thread [Next in Thread]