qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] [PATCH] virtio-blk: dataplane: release AioContext befor


From: Kevin Wolf
Subject: Re: [Qemu-block] [PATCH] virtio-blk: dataplane: release AioContext before blk_set_aio_context
Date: Thu, 28 Feb 2019 18:22:02 +0100
User-agent: Mutt/1.11.3 (2019-02-01)

Am 28.02.2019 um 18:04 hat Sergio Lopez geschrieben:
> On Thu, Feb 28, 2019 at 04:50:53PM +0100, Kevin Wolf wrote:
> > Am 28.02.2019 um 16:01 hat Sergio Lopez geschrieben:
> > > On Wed, Feb 27, 2019 at 06:37:14PM +0100, Kevin Wolf wrote:
> > > > Am 27.02.2019 um 17:52 hat Sergio Lopez geschrieben:
> > > > > Stopping the dataplane requires calling to blk_set_aio_context, which
> > > > > may need to wait for a running job to be completed or paused.
> > > > > 
> > > > > As stopping the dataplane is something that can be triggered from a 
> > > > > vcpu
> > > > > thread (due to the Guest requesting to stop the device), while the job
> > > > > itself may be managed by an iothread, holding the AioContext will lead
> > > > > to a deadlock, where the first one is waiting for the job to pause or
> > > > > finish, while the second can't make any progress as it's waiting for 
> > > > > the
> > > > > AioContext to be released:
> > > > > 
> > > > >  Thread 6 (LWP 90472)
> > > > >  #0  0x000055a6f8497aee in blk_root_drained_end
> > > > >  #1  0x000055a6f84a7926 in bdrv_parent_drained_end
> > > > >  #2  0x000055a6f84a799f in bdrv_do_drained_end
> > > > >  #3  0x000055a6f84a82ab in bdrv_drained_end
> > > > >  #4  0x000055a6f8498be8 in blk_drain
> > > > >  #5  0x000055a6f84a22cd in mirror_drain
> > > > >  #6  0x000055a6f8457708 in block_job_detach_aio_context
> > > > >  #7  0x000055a6f84538f1 in bdrv_detach_aio_context
> > > > >  #8  0x000055a6f8453a96 in bdrv_set_aio_context
> > > > >  #9  0x000055a6f84999f8 in blk_set_aio_context
> > > > >  #10 0x000055a6f81802c8 in virtio_blk_data_plane_stop
> > > > >  #11 0x000055a6f83cfc95 in virtio_bus_stop_ioeventfd
> > > > >  #12 0x000055a6f83d1f81 in virtio_pci_common_write
> > > > >  #13 0x000055a6f83d1f81 in virtio_pci_common_write
> > > > >  #14 0x000055a6f8148d62 in memory_region_write_accessor
> > > > >  #15 0x000055a6f81459c9 in access_with_adjusted_size
> > > > >  #16 0x000055a6f814a608 in memory_region_dispatch_write
> > > > >  #17 0x000055a6f80f1f98 in flatview_write_continue
> > > > >  #18 0x000055a6f80f214f in flatview_write
> > > > >  #19 0x000055a6f80f6a7b in address_space_write
> > > > >  #20 0x000055a6f80f6b15 in address_space_rw
> > > > >  #21 0x000055a6f815da08 in kvm_cpu_exec
> > > > >  #22 0x000055a6f8132fee in qemu_kvm_cpu_thread_fn
> > > > >  #23 0x000055a6f8551306 in qemu_thread_start
> > > > >  #24 0x00007f9bdf5b9dd5 in start_thread
> > > > >  #25 0x00007f9bdf2e2ead in clone
> > > > > 
> > > > >  Thread 8 (LWP 90467)
> > > > >  #0  0x00007f9bdf5c04ed in __lll_lock_wait
> > > > >  #1  0x00007f9bdf5bbde6 in _L_lock_941
> > > > >  #2  0x00007f9bdf5bbcdf in __GI___pthread_mutex_lock
> > > > >  #3  0x000055a6f8551447 in qemu_mutex_lock_impl
> > > > >  #4  0x000055a6f854be77 in co_schedule_bh_cb
> > > > >  #5  0x000055a6f854b781 in aio_bh_poll
> > > > >  #6  0x000055a6f854b781 in aio_bh_poll
> > > > >  #7  0x000055a6f854f01b in aio_poll
> > > > >  #8  0x000055a6f825a488 in iothread_run
> > > > >  #9  0x000055a6f8551306 in qemu_thread_start
> > > > >  #10 0x00007f9bdf5b9dd5 in start_thread
> > > > >  #11 0x00007f9bdf2e2ead in clone
> > > > > 
> > > > >  (gdb) thread 8
> > > > >  [Switching to thread 8 (Thread 0x7f9bd6dae700 (LWP 90467))]
> > > > >  #3  0x000055a6f8551447 in qemu_mutex_lock_impl (mutex=0x55a6fac8fea0,
> > > > >      file=0x55a6f8730c3f "util/async.c", line=511) at 
> > > > > util/qemu-thread-posix.c:66
> > > > >  66       err = pthread_mutex_lock(&mutex->lock);
> > > > >  (gdb) up 3
> > > > >  #3  0x000055a6f8551447 in qemu_mutex_lock_impl (mutex=0x55a6fac8fea0,
> > > > >      file=0x55a6f8730c3f "util/async.c", line=511) at 
> > > > > util/qemu-thread-posix.c:66
> > > > >  66       err = pthread_mutex_lock(&mutex->lock);
> > > > >  (gdb) p mutex.lock.__data.__owner
> > > > >  $6 = 90472
> > > > > 
> > > > > Signed-off-by: Sergio Lopez <address@hidden>
> > > > > ---
> > > > >  hw/block/dataplane/virtio-blk.c | 3 +--
> > > > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/hw/block/dataplane/virtio-blk.c 
> > > > > b/hw/block/dataplane/virtio-blk.c
> > > > > index 8c37bd314a..358e6ae89b 100644
> > > > > --- a/hw/block/dataplane/virtio-blk.c
> > > > > +++ b/hw/block/dataplane/virtio-blk.c
> > > > > @@ -280,12 +280,11 @@ void virtio_blk_data_plane_stop(VirtIODevice 
> > > > > *vdev)
> > > > >  
> > > > >      aio_context_acquire(s->ctx);
> > > > >      aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
> > > > > +    aio_context_release(s->ctx);
> > > > >  
> > > > >      /* Drain and switch bs back to the QEMU main loop */
> > > > >      blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context());
> > > > >  
> > > > > -    aio_context_release(s->ctx);
> > > > > -
> > > > 
> > > > blk_set_aio_context() requires that the caller hold the lock for the
> > > > source context, so I'm afraid this is wrong.
> > > 
> > > TBH, I was quite sure this patch was wrong myself, but I thought it was
> > > still a good way to illustrate the issue.
> > > 
> > > > However, I think the problem might already be fixed with my "block: Use
> > > > normal drain for bdrv_set_aio_context()" (contained in a pull request,
> > > > which isn't merged yet), which makes bdrv_set_aio_context() use a real
> > > > drain operation. This way, the requests of the mirror jobs should be
> > > > already drained before we even call into block_job_detach_aio_context().
> > > > 
> > > > Can you give this a try and see whether it fixes your problem?
> > > 
> > > I've applied your patchset to my local copy, but it doesn't fix the
> > > issue.
> > > 
> > > The problem is the coroutine is already scheduled to be run in the
> > > iothread context, which means job->busy == true, so we can't switch to
> > > it from any other place.
> > 
> > I still don't understand this because with job->paused == true the
> > hanging loop wouldn't even be started. And after bdrv_drained_begin()
> > returns, all jobs that use the node in question should be paused (see
> > child_job_drained_begin/poll).
> > 
> > So somehow that drain (called in bdrv_set_aio_context()) doesn't seem to
> > fully do what it is supposed to do?
> 
> IIUC, child_job_drained_begin() requests the job to be paused (something
> that block_job_detach_aio_context() also does), but we don't get to
> child_job_drained_poll() as bdrv_parent_drained_begin_single() is called
> with "poll == false" by bdrv_parent_drained_begin().
> 
> But even if we did, that probably won't help in some scenarios, as
> mirror's drained_poll implementation just checks if there are in_flight
> requests, so it might happen that BDRV_POLL_WHILE returns without having
> called aio_wait() a single time.
> 
> In other words, I think the drain code makes sure there aren't any
> in_flight requests in the chain, but doesn't provide by itself a
> guarantee that the jobs have been paused.

Yes, seems this is what it does. But I think that's not enough because
if the job isn't paused, it can still issue new requests.

I'm not sure whether mirror_drained_poll() or child_job_drained_poll()
is to blame, but the logic is wrong there: We should only return that
the job is quiescent when it has reached a pause point _and_
s->in_flight == 0.

IIUC, the reason why mirror even needs this .drained_poll implementation
is that it spawns additional coroutines, so waiting just for the main
coroutine to reach a pause point is not enough because the other
coroutines could still be active. But this is only an additional
condition, not one that could replace the condition that the main
coroutine must be inactive, too.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]