Re: [Qemu-block] [PATCH] vl: pause vcpus before stopping iothreads

From: Stefan Hajnoczi
Subject: Re: [Qemu-block] [PATCH] vl: pause vcpus before stopping iothreads
Date: Wed, 31 Jan 2018 13:56:28 +0000
On Tue, Jan 30, 2018 at 05:54:56PM +0100, Kevin Wolf wrote:
> Am 30.01.2018 um 16:38 hat Stefan Hajnoczi geschrieben:
> > Commit dce8921b2baaf95974af8176406881872067adfa ("iothread: Stop threads
> > before main() quits") introduced iothread_stop_all() to avoid the
> > following virtio-scsi assertion failure:
> > 
> >   assert(blk_get_aio_context(d->conf.blk) == s->ctx);
> > 
> > Back then the assertion failed because when bdrv_close_all() made
> > d->conf.blk NULL, blk_get_aio_context() returned the global AioContext
> > instead of s->ctx.
> > 
> > The same assertion can still fail today when vcpus submit new I/O
> > requests after iothread_stop_all() has moved the BDS to the global
> > AioContext.
> > 
> > This patch hardens the iothread_stop_all() approach by pausing vcpus
> > before calling iothread_stop_all().
> > 
> > Note that the assertion failure is a race condition.  It is not possible
> > to reproduce it reliably.
> > 
> > Signed-off-by: Stefan Hajnoczi <address@hidden>
> Does pausing the vcpus actually make sure that the iothread isn't active
> any more, or do we still have a small window where the vcpu is already
> stopped, but the iothread is still processing requests?
> Essentially, I think the bdrv_set_aio_context() in iothread_stop_all()
> does either not have any effect, or if it does have an effect, it's
> wrong. You can't just force an in-use BDS into a different AioContext
> when the user that set the AioContext is still there.
> At the very least, do we need a blk_drain_all() before stopping the
> iothreads?

bdrv_set_aio_context() contains aio_disable_external() +
bdrv_parent_drained_begin() + bdrv_drain(bs).  This should complete all
requests, even those sitting in a descriptor ring that hasn't been
processed yet.

> It would still just be a hack, the proper way seens to be
> getting the virtio device out of dataplane mode so that the iothread is
> actually unused and doesn't just happen to not process something at the
> moment.

Agreed, the existing approach is a hack.  I'm not keen on implementing
a proper device<->IOThread detach operation because vl.c:main() seems to
be the only place that needs it - and it can get away with just
quiescing requests and the IOThread instead.


