qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion proces


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH 3/3] linux-aio: fix re-entrant completion processing
Date: Tue, 27 Sep 2016 16:25:38 +0100
User-agent: Mutt/1.7.0 (2016-08-17)

On Tue, Sep 27, 2016 at 04:29:55PM +0200, Roman Penyaev wrote:
> On Tue, Sep 27, 2016 at 4:06 PM, Stefan Hajnoczi <address@hidden> wrote:
> > Commit 0ed93d84edabc7656f5c998ae1a346fe8b94ca54 ("linux-aio: process
> > completions from ioq_submit()") added an optimization that processes
> > completions each time ioq_submit() returns with requests in flight.
> > This commit introduces a "Co-routine re-entered recursively" error which
> > can be triggered with -drive format=qcow2,aio=native.
> >
> > Fam Zheng <address@hidden>, Kevin Wolf <address@hidden>, and I
> > debugged the following backtrace:
> >
> >   (gdb) bt
> >   #0  0x00007ffff0a046f5 in raise () at /lib64/libc.so.6
> >   #1  0x00007ffff0a062fa in abort () at /lib64/libc.so.6
> >   #2  0x0000555555ac0013 in qemu_coroutine_enter (co=0x5555583464d0) at 
> > util/qemu-coroutine.c:113
> >   #3  0x0000555555a4b663 in qemu_laio_process_completions (address@hidden) 
> > at block/linux-aio.c:218
> >   #4  0x0000555555a4b874 in ioq_submit (address@hidden) at 
> > block/linux-aio.c:331
> >   #5  0x0000555555a4ba12 in laio_do_submit (address@hidden, address@hidden, 
> > address@hidden, address@hidden) at block/linux-aio.c:383
> >   #6  0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, 
> > s=0x555557e2f7f0, fd=13, offset=2932727808, qiov=0x555559d38e20, type=1) at 
> > block/linux-aio.c:402
> >   #7  0x0000555555a4fd23 in bdrv_driver_preadv (address@hidden, 
> > address@hidden, address@hidden, address@hidden, flags=0) at block/io.c:804
> >   #8  0x0000555555a52b34 in bdrv_aligned_preadv (address@hidden, 
> > address@hidden, address@hidden, address@hidden, address@hidden, 
> > address@hidden, flags=0) at block/io.c:1041
> >   #9  0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, 
> > offset=2932727808, bytes=8192, address@hidden, address@hidden) at 
> > block/io.c:1133
> >   #10 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, 
> > offset=6178725888, bytes=8192, qiov=0x555557527840, flags=<optimized out>) 
> > at block/qcow2.c:1509
> >   #11 0x0000555555a4fd23 in bdrv_driver_preadv (address@hidden, 
> > address@hidden, address@hidden, address@hidden, flags=0) at block/io.c:804
> >   #12 0x0000555555a52b34 in bdrv_aligned_preadv (address@hidden, 
> > address@hidden, address@hidden, address@hidden, address@hidden, 
> > address@hidden, flags=0) at block/io.c:1041
> >   #13 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, 
> > address@hidden, address@hidden, address@hidden, address@hidden) at 
> > block/io.c:1133
> >   #14 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, 
> > offset=6178725888, bytes=8192, qiov=0x555557527840, flags=0) at 
> > block/block-backend.c:783
> >   #15 0x0000555555a45266 in blk_aio_read_entry (opaque=0x5555577025e0) at 
> > block/block-backend.c:991
> >   #16 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, 
> > i1=<optimized out>) at util/coroutine-ucontext.c:78
> >
> > It turned out that re-entrant ioq_submit() and completion processing
> > between three requests caused this error.  The following check is not
> > sufficient to prevent recursively entering coroutines:
> >
> >   if (laiocb->co != qemu_coroutine_self()) {
> >       qemu_coroutine_enter(laiocb->co);
> >   }
> >
> > As the following coroutine backtrace shows, not just the current
> > coroutine (self) can be entered.  There might also be other coroutines
> > that are currently entered and transferred control due to the qcow2 lock
> > (CoMutex):
> 
> I doubt that that was introduced by the commit you've specified:
> 0ed93d84edab.
> 
> Before my patch coroutine was unconditionally entered.  The following
> is what was changed by 0ed93d84edab:
> 
>      if (laiocb->co) {
> -        qemu_coroutine_enter(laiocb->co);
> +        /* Jump and continue completion for foreign requests, don't do
> +         * anything for current request, it will be completed shortly. */
> +        if (laiocb->co != qemu_coroutine_self()) {
> +            qemu_coroutine_enter(laiocb->co);
> +        }

Unconditionally entering was safe prior to 0ed93d84edab since all
coroutines yielded and qemu_coroutine_entered() would be false all the
time.  Therefore it wasn't necessary to protect against re-entering a
coroutine.

> If you have a strong reproduction, could you please verify that.

The bug is 100% deterministic.  Just boot up a guest with -drive
format=qcow2,aio=native.

I noticed that I forgot to include a second backtrace in the commit
description.  I am resending the patch series with the missing
backtrace.  Hopefully that will make the bug clearer.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]