[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC v3 01/27] char-io: fix possible race on IOWatchPol
From: |
Stefan Hajnoczi |
Subject: |
Re: [Qemu-devel] [RFC v3 01/27] char-io: fix possible race on IOWatchPoll |
Date: |
Tue, 14 Nov 2017 10:32:19 +0000 |
User-agent: |
Mutt/1.9.1 (2017-09-22) |
On Tue, Nov 14, 2017 at 02:09:39PM +0800, Peter Xu wrote:
> On Mon, Nov 13, 2017 at 04:52:11PM +0000, Stefan Hajnoczi wrote:
> > On Mon, Nov 06, 2017 at 05:46:17PM +0800, Peter Xu wrote:
> > > This is not a problem if we are only having one single loop thread like
> > > before. However, after per-monitor thread is introduced, this is not
> > > true any more, and the race can happen.
> > >
> > > The race can be triggered with "make check -j8" sometimes:
> >
> > Please mention a specific test case that fails.
>
> It was any of the check-qtest-$(TARGET)s that failed. I'll mention
> that in next post.
>
> >
> > >
> > > qemu-system-x86_64: /root/git/qemu/chardev/char-io.c:91:
> > > io_watch_poll_finalize: Assertion `iwp->src == NULL' failed.
> > >
> > > This patch keeps the reference for the watch object when creating in
> > > io_add_watch_poll(), so that the object will never be released in the
> > > context main loop, especially when the context loop is running in
> > > another standalone thread. Meanwhile, when we want to remove the watch
> > > object, we always first detach the watch object from its owner context,
> > > then we continue with the cleanup.
> > >
> > > Without this patch, calling io_remove_watch_poll() in main loop thread
> > > is not thread-safe, since the other per-monitor thread may be modifying
> > > the watch object at the same time.
> > >
> > > Reviewed-by: Marc-André Lureau <address@hidden>
> > > Signed-off-by: Peter Xu <address@hidden>
> > > ---
> > > chardev/char-io.c | 16 ++++++++++++++--
> > > 1 file changed, 14 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/chardev/char-io.c b/chardev/char-io.c
> > > index f81052481a..50b5bac704 100644
> > > --- a/chardev/char-io.c
> > > +++ b/chardev/char-io.c
> > > @@ -122,7 +122,6 @@ GSource *io_add_watch_poll(Chardev *chr,
> > > g_free(name);
> > >
> > > g_source_attach(&iwp->parent, context);
> > > - g_source_unref(&iwp->parent);
> > > return (GSource *)iwp;
> > > }
> > >
> > > @@ -131,12 +130,25 @@ static void io_remove_watch_poll(GSource *source)
> > > IOWatchPoll *iwp;
> > >
> > > iwp = io_watch_poll_from_source(source);
> > > +
> > > + /*
> > > + * Here the order of destruction really matters. We need to first
> > > + * detach the IOWatchPoll object from the context (which may still
> > > + * be running in another loop thread), only after that could we
> > > + * continue to operate on iwp->src, or there may be race condition
> > > + * between current thread and the context loop thread.
> > > + *
> > > + * Let's blame the glib bug mentioned in commit 2b316774f6
> > > + * ("qemu-char: do not operate on sources from finalize
> > > + * callbacks") for this extra complexity.
> >
> > I don't understand how this bug is to blame. Isn't the problem here a
> > race condition between two QEMU threads?
>
> Yes, it is.
>
> The problem is, we won't have the race condition if glib does not have
> that bug mentioned. Then the thread running GMainContext will have
> full control of iwp->src destruction, and destruction of it would be
> fairly straightforward (unref iwp->src in IOWatchPoll destructor).
> Now IIUC we are doing this in a hacky way, say, we destroy iwp->src
> explicitly from main thread before quitting (see [1] below, the whole
> if clause).
>
> >
> > Why are two threads accessing the watch at the same time?
>
> Here is how I understand:
>
> Firstly we need to tackle with that bug, by an explicit destruction of
> iwp->src below; meanwhile when we are destroying it, the GMainContext
> can still be running somewhere (it's not happening in current series
> since I stopped iothread earlier than this point, however it can still
> happen if in the future we don't do that), then we possibly want this
> patch.
>
> Again, without this patch, current series should work; however I do
> hope this patch can be in, in case someday we want to provide complete
> thread safety for Chardevs (now it is not really thread-safe).
You said qtests fail with "Assertion `iwp->src == NULL' failed" but then
you said "without this patch, current series should work". How do you
reproduce the failure if it doesn't occur?
It looks like remove_fd_in_watch() -> io_remove_watch_poll() callers
fall into two categories: called from within the event loop and called
when a chardev is destroyed. Do the thread-safety issues occur when the
chardev is destroyed by the QEMU main loop thread? Or did I miss cases
where remove_fd_in_watch() is called from other threads?
>
> >
> > > + */
> > > + g_source_destroy(&iwp->parent);
> > > if (iwp->src) {
> > > g_source_destroy(iwp->src);
> > > g_source_unref(iwp->src);
> > > iwp->src = NULL;
> > > }
>
> [1]
>
> > > - g_source_destroy(&iwp->parent);
> > > + g_source_unref(&iwp->parent);
> > > }
> > >
> > > void remove_fd_in_watch(Chardev *chr)
> > > --
> > > 2.13.5
> > >
>
> Thanks,
>
> --
> Peter Xu
signature.asc
Description: PGP signature
[Qemu-devel] [RFC v3 02/27] qobject: introduce qstring_get_try_str(), Peter Xu, 2017/11/06
[Qemu-devel] [RFC v3 03/27] qobject: introduce qobject_get_try_str(), Peter Xu, 2017/11/06
[Qemu-devel] [RFC v3 04/27] qobject: let object_property_get_str() use new API, Peter Xu, 2017/11/06
[Qemu-devel] [RFC v3 05/27] monitor: move skip_flush into monitor_data_init, Peter Xu, 2017/11/06