qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/repl


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH 3/3] replay: introduce block devices record/replay
Date: Mon, 15 Feb 2016 10:38:10 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Am 15.02.2016 um 10:14 hat Pavel Dovgalyuk geschrieben:
> > From: Pavel Dovgalyuk [mailto:address@hidden
> > > From: Kevin Wolf [mailto:address@hidden
> > > > >
> > > > > int blkreplay_co_readv()
> > > > > {
> > > > >     BlockReplayState *s = bs->opaque;
> > > > >     int reqid = s->reqid++;
> > > > >
> > > > >     bdrv_co_readv(bs->file, ...);
> > > > >
> > > > >     if (mode == record) {
> > > > >         log(reqid, time);
> > > > >     } else {
> > > > >         assert(mode == replay);
> > > > >         bool *done = req_replayed_list_get(reqid)
> > > > >         if (done) {
> > > > >             *done = true;
> > > > >         } else {
> > > > point A
> > > > >             req_completed_list_insert(reqid, qemu_coroutine_self());
> > > > >             qemu_coroutine_yield();
> > > > >         }
> > > > >     }
> > > > > }
> > > > >
> > > > > /* called by replay.c */
> > > > > int blkreplay_run_event()
> > > > > {
> > > > >     if (mode == replay) {
> > > > >         co = req_completed_list_get(e.reqid);
> > > > >         if (co) {
> > > > >             qemu_coroutine_enter(co);
> > > > >         } else {
> > > > >             bool done = false;
> > > > >             req_replayed_list_insert(reqid, &done);
> > > > point B
> > > > >             /* wait synchronously for completion */
> > > > >             while (!done) {
> > > > >                 aio_poll();
> > > > >             }
> > > > >         }
> > > > >     }
> > > > > }
> > > >
> > > > One more question about coroutines.
> > > > Are race conditions possible in this sample?
> > > > In replay mode we may call readv, and reach point A.
> > > > On the same time, we will read point B in another thread.
> > > > Then readv will yield and nobody will start it back?
> > >
> > > There are two aspects to this:
> > >
> > > * Real multithreading doesn't exist in the block layer. All block driver
> > >   functions are only called with the mutex in the AioContext held. There
> > >   is exactly one AioContext per BDS, so no two threads can possible be
> > >   operating on the same BDS at the same time.
> > >
> > > * Coroutines are different from threads in that they aren't preemptive.
> > >   They are only interrupted in places where they explicitly yield.
> > >
> > > Of course, in order for this to work, we actually need to take the mutex
> > > before calling blkreplay_run_event(), which is called directly from the
> > > replay code (which runs in the mainloop thread? Or vcpu?).
> > 
> > blkreplay_run_event() is called from replay code which is protected by 
> > mutex.
> > This function may be called from io and vcpu threads, because both of them
> > have replay functions invocations.
> 
> Now I've encountered a situation where blkreplay_run_event is called from 
> read coroutine:
> bdrv_prwv_co -> aio_poll -> qemu_clock_get_ns -> replay_read_clock -> 
> blkreplay_run_event
>            \--> bdrv_co_readv -> blkreplay_co_readv -> bdrv_co_readv(lower 
> layer)
> 
> bdrv_co_readv inside blkreplay_co_readv can't proceed in this situation.
> This is probably because aio_poll has taken the aio context?
> How can I resolve this?

First of all, I'm not sure if running replay events from
qemu_clock_get_ns() is such a great idea. This is not a function that
callers expect to change any state. If you absolutely have to do it
there instead of in the clock device emulations, maybe restricting it to
replaying clock events could make it a bit more harmless.

Anyway, what does "can't proceed" mean? The coroutine yields because
it's waiting for I/O, but it is never reentered? Or is it hanging while
trying to acquire a lock?

Calling the callbacks that reenter a yielded coroutine is generally the
job of aio_poll(). After reentering the coroutine, blkreplay_run_event()
should return back to its caller and therefore indirectly to aio_poll(),
which should drive the events. Sounds like it should be working.

Can you provide more detail about the exact place where it's hanging,
both in the coroutine and in the main "coroutine" that executes
aio_poll()?

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]