qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH 00/22] dataplane: use QEMU block layer
Date: Tue, 6 May 2014 10:39:53 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Mon, May 05, 2014 at 02:46:09PM +0200, Christian Borntraeger wrote:
> On 05/05/14 14:05, Stefan Hajnoczi wrote:
> > On Mon, May 05, 2014 at 11:17:44AM +0200, Christian Borntraeger wrote:
> >> On 01/05/14 16:54, Stefan Hajnoczi wrote:
> >>> This patch series switches virtio-blk data-plane from a custom Linux AIO
> >>> request queue to the QEMU block layer.  The previous "raw files only"
> >>> limitation is lifted.  All image formats and protocols can now be used 
> >>> with
> >>> virtio-blk data-plane.
> >>
> >> Nice. Is there a git branch somewhere, so that we can test this on s390?
> > 
> > Hi Christian,
> > I'm getting to work on v2 but you can grab this v1 series from git in
> > the meantime:
> > 
> > https://github.com/stefanha/qemu.git bdrv_set_aio_context
> > 
> > Stefan
> > 
> 
> In general the main path seems to work fine.
> 
> With lots of devices (one qcow2, 23 raw scsi disks)
> I get a hang on shutdown. kvm_stat claims that nothing is going on any more, 
> but somehow threads are stuck in ppoll.
> 
> gdb tells me that 
> 
> all cpus have
> #0  0x000003fffcde0ba0 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x000003fffcde3c0c in __pthread_mutex_cond_lock () from 
> /lib64/libpthread.so.0
> #2  0x000003fffcddc99a in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #3  0x00000000801f183a in qemu_cond_wait (cond=<optimized out>, 
> address@hidden <qemu_global_mutex>) at 
> /home/cborntra/REPOS/qemu/util/qemu-thread-posix.c:135
> #4  0x00000000801512f2 in qemu_kvm_wait_io_event (cpu=<optimized out>) at 
> /home/cborntra/REPOS/qemu/cpus.c:842
> #5  qemu_kvm_cpu_thread_fn (arg=0x80a53e10) at 
> /home/cborntra/REPOS/qemu/cpus.c:878
> 
> all iothreads have
> #0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
> #1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized 
> out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
> #2  qemu_poll_ns (address@hidden, address@hidden, timeout=-1) at 
> /home/cborntra/REPOS/qemu/qemu-timer.c:311
> #3  0x000000008001ae4c in aio_poll (ctx=0x807dd610, address@hidden) at 
> /home/cborntra/REPOS/qemu/aio-posix.c:221
> #4  0x00000000800b2f6c in iothread_run (opaque=0x807dd4c8) at 
> /home/cborntra/REPOS/qemu/iothread.c:41
> #5  0x000003fffcdd8412 in start_thread () from /lib64/libpthread.so.0
> #6  0x000003fffbc3f0ae in thread_start () from /lib64/libc.so.6
> 
> the main thread has
> Thread 1 (Thread 0x3fff9e5c9b0 (LWP 33684)):
> #0  0x000003fffbc348e0 in ppoll () from /lib64/libc.so.6
> #1  0x00000000800fcce6 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized 
> out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
> #2  qemu_poll_ns (address@hidden, address@hidden, timeout=-1) at 
> /home/cborntra/REPOS/qemu/qemu-timer.c:311
> #3  0x000000008001ae4c in aio_poll (address@hidden, address@hidden) at 
> /home/cborntra/REPOS/qemu/aio-posix.c:221
> #4  0x0000000080030c46 in bdrv_flush (address@hidden) at 
> /home/cborntra/REPOS/qemu/block.c:4904
> #5  0x0000000080030ce8 in bdrv_flush_all () at 
> /home/cborntra/REPOS/qemu/block.c:3723
> #6  0x0000000080152fe8 in do_vm_stop (state=<optimized out>) at 
> /home/cborntra/REPOS/qemu/cpus.c:538
> #7  vm_stop (state=<optimized out>) at /home/cborntra/REPOS/qemu/cpus.c:1219
> #8  0x0000000000000000 in ?? ()
> 
> 
> How are the ppoll calls supposed to return if there is nothing going on?

The AioContext event loop has an event notifier to kick the AioContext.
This is how you can signal it from another thread.

> PS: I think I have seen this before recently during managedsave, so it might 
> have been introduced with the iothread rework instead of this one.

I suspect this is due to a race condition in bdrv_flush_all().  In this
series I added AioContext acquire/release for bdrv_close_all() so that
vl.c:main() shutdown works.  It's probably a similar issue.

Thanks for raising this issue, I'll investigate and send a fix.  I
suspect this is not the other issue which you saw during managedsave.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]