qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH for-2.4 0/2] AioContext: fix deadlock after aio_


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH for-2.4 0/2] AioContext: fix deadlock after aio_context_acquire() race
Date: Tue, 28 Jul 2015 11:31:49 +0100

On Tue, Jul 28, 2015 at 11:26 AM, Cornelia Huck
<address@hidden> wrote:
> On Tue, 28 Jul 2015 09:34:46 +0100
> Stefan Hajnoczi <address@hidden> wrote:
>
>> On Tue, Jul 28, 2015 at 10:02:26AM +0200, Cornelia Huck wrote:
>> > On Tue, 28 Jul 2015 09:07:00 +0200
>> > Cornelia Huck <address@hidden> wrote:
>> >
>> > > On Mon, 27 Jul 2015 17:33:37 +0100
>> > > Stefan Hajnoczi <address@hidden> wrote:
>> > >
>> > > > See Patch 2 for details on the deadlock after two 
>> > > > aio_context_acquire() calls
>> > > > race.  This caused dataplane to hang on startup.
>> > > >
>> > > > Patch 1 is a memory leak fix for AioContext that's needed by Patch 2.
>> > > >
>> > > > Stefan Hajnoczi (2):
>> > > >   AioContext: avoid leaking BHs on cleanup
>> > > >   AioContext: force event loop iteration using BH
>> > > >
>> > > >  async.c             | 29 +++++++++++++++++++++++++++--
>> > > >  include/block/aio.h |  3 +++
>> > > >  2 files changed, 30 insertions(+), 2 deletions(-)
>> > > >
>> > >
>> > > Just gave this a try: The stripped-down guest that hangs during startup
>> > > on master is working fine with these patches applied, and my full setup
>> > > works as well.
>> > >
>> > > So,
>> > >
>> > > Tested-by: Cornelia Huck <address@hidden>
>> >
>> > Uh-oh, spoke too soon. It starts, but when I try a virsh managedsave, I
>> > get
>> >
>> > qemu-system-s390x: /data/git/yyy/qemu/async.c:242: aio_ctx_finalize: 
>> > Assertion `ctx->first_bh->deleted' failed.
>>
>> Please pretty-print ctx->first_bh in gdb.  In particular, which function
>> is ctx->first_bh->cb pointing to?
>
> (gdb) p/x *(QEMUBH *)ctx->first_bh
> $2 = {ctx = 0x9aab3730, cb = 0x801b7c5c, opaque = 0x3ff9800dee0, next =
>     0x3ff9800dfb0, scheduled = 0x0, idle = 0x0, deleted = 0x0}
>
> cb is pointing at spawn_thread_bh_fn.
>
>>
>> I tried reproducing with qemu-system-x86_64 and a RHEL 7 guest but
>> couldn't trigger the assertion failure.
>
> I use the old x-data-plane attribute; if I turn it off, I don't hit the
> assertion.

Thanks.  I understand how to reproduce it now: use -drive aio=threads
and do I/O during managedsave.

I suspect there are more cases of this.  We need to clean it up during QEMU 2.5.

For now let's continue leaking these BHs as we've always done.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]