qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-block] AioContext locking vs. coroutines (was: [PATCH] mirror dead


From: Kevin Wolf
Subject: [Qemu-block] AioContext locking vs. coroutines (was: [PATCH] mirror dead-lock)
Date: Thu, 29 Nov 2018 11:42:55 +0100
User-agent: Mutt/1.10.1 (2018-07-13)

Am 28.11.2018 um 21:13 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Hi all!
> 
> We've faced the following mirror bug:
> 
> Just run mirror on qcow2 image more than 1G, and qemu is in dead lock.
> 
> Note: I've decided to send this as a patch with reproducer, to make it
> easier to reproduce). No needs to commit this before mirror fix, but
> after, commit message may be a bit shortened, may be even up to just
> "iotests: add simple mirror test".
> 
> Note2: if drop 'kvm' option from the test it still reproduces, but if
> use iotests.VM() - does not (may be, because of qtest?). If add
> iothread, it doesn't reproduce too. Also, it doesn't reproduce with raw
> instead of qcow2 and doesn't reproduce for small images.
> 
> So, here is the dead-lock:
> 
>  (gdb) info thr
>    Id   Target Id         Frame
>    3    Thread 0x7fd7fd4ea700 (LWP 145412) "qemu-system-x86" syscall () at 
> ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
>    2    Thread 0x7fd7fc205700 (LWP 145416) "qemu-system-x86" __lll_lock_wait 
> () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>  * 1    Thread 0x7fd8102cfcc0 (LWP 145411) "qemu-system-x86" __lll_lock_wait 
> () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>  (gdb) bt
>  #0  0x00007fd80e8864cd in __lll_lock_wait () at 
> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>  #1  0x00007fd80e881dcb in _L_lock_812 () at /lib64/libpthread.so.0
>  #2  0x00007fd80e881c98 in __GI___pthread_mutex_lock (mutex=0x561032dce420 
> <qemu_global_mutex>) at ../nptl/pthread_mutex_lock.c:79
>  #3  0x0000561032654c3b in qemu_mutex_lock_impl (mutex=0x561032dce420 
> <qemu_global_mutex>, file=0x5610327d8654 "util/main-loop.c", line=236) at 
> util/qemu-thread-posix.c:66
>  #4  0x00005610320cb327 in qemu_mutex_lock_iothread_impl (file=0x5610327d8654 
> "util/main-loop.c", line=236) at 
> /work/src/qemu/up-mirror-dead-lock/cpus.c:1849
>  #5  0x000056103265038f in os_host_main_loop_wait (timeout=480116000) at 
> util/main-loop.c:236
>  #6  0x0000561032650450 in main_loop_wait (nonblocking=0) at 
> util/main-loop.c:497
>  #7  0x00005610322575c9 in main_loop () at vl.c:1892
>  #8  0x000056103225f3c7 in main (argc=13, argv=0x7ffcc8bb15a8, 
> envp=0x7ffcc8bb1618) at vl.c:4648
>  (gdb) p qemu_global_mutex
>  $1 = {lock = {__data = {__lock = 2, __count = 0, __owner = 145416, __nusers 
> = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 
> 0x0}},
>      __size = "\002\000\000\000\000\000\000\000\b8\002\000\001", '\000' 
> <repeats 26 times>, __align = 2}, file = 0x56103267bcb0 
> "/work/src/qemu/up-mirror-dead-lock/exec.c", line = 3197, initialized = true}
> 
>  So, we see, that thr1 is main loop, and now it wants BQL, which is
>  owned by thr2.
> 
>  (gdb) thr 2
>  [Switching to thread 2 (Thread 0x7fd7fc205700 (LWP 145416))]
>  #0  __lll_lock_wait () at 
> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>  135     2:      movl    %edx, %eax
>  (gdb) bt
>  #0  0x00007fd80e8864cd in __lll_lock_wait () at 
> ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>  #1  0x00007fd80e881de6 in _L_lock_870 () at /lib64/libpthread.so.0
>  #2  0x00007fd80e881cdf in __GI___pthread_mutex_lock (mutex=0x561034d25dc0) 
> at ../nptl/pthread_mutex_lock.c:114
>  #3  0x0000561032654c3b in qemu_mutex_lock_impl (mutex=0x561034d25dc0, 
> file=0x5610327d81d1 "util/async.c", line=511) at util/qemu-thread-posix.c:66
>  #4  0x000056103264d840 in aio_context_acquire (ctx=0x561034d25d60) at 
> util/async.c:511
>  #5  0x0000561032254554 in dma_blk_cb (opaque=0x7fd7f41463b0, ret=0) at 
> dma-helpers.c:169
>  #6  0x000056103225479d in dma_blk_io (ctx=0x561034d25d60, sg=0x561035a80210, 
> offset=0, align=512, io_func=0x5610322547a3 <dma_blk_read_io_func>, 
> io_func_opaque=0x561034d40870, cb=0x561032340b35 <ide_dma_cb>, 
> opaque=0x561035a7fee8, dir=DMA_DIRECTION_FROM_DEVICE) at dma-helpers.c:227
>  #7  0x0000561032254855 in dma_blk_read (blk=0x561034d40870, 
> sg=0x561035a80210, offset=0, align=512, cb=0x561032340b35 <ide_dma_cb>, 
> opaque=0x561035a7fee8) at dma-helpers.c:245
>  #8  0x0000561032340e6e in ide_dma_cb (opaque=0x561035a7fee8, ret=0) at 
> hw/ide/core.c:910
>  #9  0x000056103234a912 in bmdma_cmd_writeb (bm=0x561035a81030, val=9) at 
> hw/ide/pci.c:240
>  #10 0x000056103234b5bd in bmdma_write (opaque=0x561035a81030, addr=0, val=9, 
> size=1) at hw/ide/piix.c:76
>  #11 0x00005610320e5763 in memory_region_write_accessor (mr=0x561035a81180, 
> addr=0, value=0x7fd7fc204678, size=1, shift=0, mask=255, attrs=...) at 
> /work/src/qemu/up-mirror-dead-lock/memory.c:504
>  #12 0x00005610320e596d in access_with_adjusted_size (addr=0, 
> value=0x7fd7fc204678, size=1, access_size_min=1, access_size_max=4, access_fn=
>      0x5610320e5683 <memory_region_write_accessor>, mr=0x561035a81180, 
> attrs=...) at /work/src/qemu/up-mirror-dead-lock/memory.c:570
>  #13 0x00005610320e86ce in memory_region_dispatch_write (mr=0x561035a81180, 
> addr=0, data=9, size=1, attrs=...) at 
> /work/src/qemu/up-mirror-dead-lock/memory.c:1452
>  #14 0x0000561032083770 in flatview_write_continue (fv=0x7fd7f4108090, 
> addr=49216, attrs=..., buf=0x7fd810309000 "\t\371\061", len=1, addr1=0, l=1, 
> mr=0x561035a81180)
>      at /work/src/qemu/up-mirror-dead-lock/exec.c:3233
>  #15 0x00005610320838cf in flatview_write (fv=0x7fd7f4108090, addr=49216, 
> attrs=..., buf=0x7fd810309000 "\t\371\061", len=1) at 
> /work/src/qemu/up-mirror-dead-lock/exec.c:3272
>  #16 0x0000561032083bef in address_space_write (as=0x561032dcdc40 
> <address_space_io>, addr=49216, attrs=..., buf=0x7fd810309000 "\t\371\061", 
> len=1) at /work/src/qemu/up-mirror-dead-lock/exec.c:3362
>  #17 0x0000561032083c40 in address_space_rw (as=0x561032dcdc40 
> <address_space_io>, addr=49216, attrs=..., buf=0x7fd810309000 "\t\371\061", 
> len=1, is_write=true) at /work/src/qemu/up-mirror-dead-lock/exec.c:3373
>  #18 0x0000561032104766 in kvm_handle_io (port=49216, attrs=..., 
> data=0x7fd810309000, direction=1, size=1, count=1) at 
> /work/src/qemu/up-mirror-dead-lock/accel/kvm/kvm-all.c:1775
>  #19 0x0000561032104f0e in kvm_cpu_exec (cpu=0x561034d60240) at 
> /work/src/qemu/up-mirror-dead-lock/accel/kvm/kvm-all.c:2021
>  #20 0x00005610320ca396 in qemu_kvm_cpu_thread_fn (arg=0x561034d60240) at 
> /work/src/qemu/up-mirror-dead-lock/cpus.c:1281
>  #21 0x00005610326557ab in qemu_thread_start (args=0x561034d81a70) at 
> util/qemu-thread-posix.c:498
>  #22 0x00007fd80e87fdd5 in start_thread (arg=0x7fd7fc205700) at 
> pthread_create.c:308
>  #23 0x00007fd804c97b3d in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
>  (gdb) fr 2
>  #2  0x00007fd80e881cdf in __GI___pthread_mutex_lock (mutex=0x561034d25dc0) 
> at ../nptl/pthread_mutex_lock.c:114
>  114           LLL_MUTEX_LOCK (mutex);
>  (gdb) p *mutex
>  $2 = {__data = {__lock = 2, __count = 16, __owner = 145411, __nusers = 1, 
> __kind = 1, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 
> 0x0}},
>    __size = 
> "\002\000\000\000\020\000\000\000\003\070\002\000\001\000\000\000\001", 
> '\000' <repeats 22 times>, __align = 68719476738}
> 
>  And thr2 wants aio context mutex, which is owned by thr1. Classic
>  dead-lock.
> 
>  But who took aio context? Suppose that it is mirror and check:
> 
>  (gdb) p blk_next(0)->root->bs->job
>  $3 = (BlockJob *) 0x5610358e2520
>  (gdb) p (MirrorBlockJob *)$3
>  $4 = (MirrorBlockJob *) 0x5610358e2520
>  [...]
>  (gdb) p $4->target->root->bs->tracked_requests.lh_first->co
>  $7 = (Coroutine *) 0x561035dd0860
>  [...]
>  (gdb) qemu coroutine 0x561035dd0860
>  #0  0x0000561032670ab2 in qemu_coroutine_switch (from_=0x561035dd0860, 
> to_=0x7fd8102cfb80, action=COROUTINE_YIELD) at util/coroutine-ucontext.c:218
>  #1  0x000056103266f0fc in qemu_coroutine_yield () at 
> util/qemu-coroutine.c:194
>  #2  0x000056103266fb76 in qemu_co_mutex_lock_slowpath (ctx=0x561034d25d60, 
> mutex=0x561035c5a4b0) at util/qemu-coroutine-lock.c:235
>  #3  0x000056103266fc82 in qemu_co_mutex_lock (mutex=0x561035c5a4b0) at 
> util/qemu-coroutine-lock.c:273
>  #4  0x0000561032544f4e in qcow2_co_pwritev (bs=0x561035b86890, 
> offset=531628032, bytes=1048576, qiov=0x561034fd08a8, flags=0) at 
> block/qcow2.c:2063
>  #5  0x000056103258e5df in bdrv_driver_pwritev (bs=0x561035b86890, 
> offset=531628032, bytes=1048576, qiov=0x561034fd08a8, flags=0) at 
> block/io.c:1092
>  #6  0x000056103259004b in bdrv_aligned_pwritev (child=0x561035b178c0, 
> req=0x7fd7a23eae30, offset=531628032, bytes=1048576, align=1, 
> qiov=0x561034fd08a8, flags=0) at block/io.c:1719
>  #7  0x0000561032590cb2 in bdrv_co_pwritev (child=0x561035b178c0, 
> offset=531628032, bytes=1048576, qiov=0x561034fd08a8, flags=0) at 
> block/io.c:1961
>  #8  0x0000561032578478 in blk_co_pwritev (blk=0x561035dd0400, 
> offset=531628032, bytes=1048576, qiov=0x561034fd08a8, flags=0) at 
> block/block-backend.c:1203
>  #9  0x00005610325866a0 in mirror_read_complete (op=0x561034fd08a0, ret=0) at 
> block/mirror.c:232
>  #10 0x0000561032586e1a in mirror_co_read (opaque=0x561034fd08a0) at 
> block/mirror.c:370
>  #11 0x000056103267081d in coroutine_trampoline (i0=903678048, i1=22032) at 
> util/coroutine-ucontext.c:116
>  #12 0x00007fd804be0fc0 in __start_context () at /lib64/libc.so.6
>  #13 0x00007fd7fafff600 in  ()
>  #14 0x0000000000000000 in  ()
> 
>  Ok, look at mirror_read_complete:
>  static void coroutine_fn mirror_read_complete(MirrorOp *op, int ret)
>  {
>      MirrorBlockJob *s = op->s;
> 
>      aio_context_acquire(blk_get_aio_context(s->common.blk));
>      if (ret < 0) {
>          BlockErrorAction action;
> 
>          bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset, op->bytes);
>          action = mirror_error_action(s, true, -ret);
>          if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
>              s->ret = ret;
>          }
> 
>          mirror_iteration_done(op, ret);
>      } else {
>          ret = blk_co_pwritev(s->target, op->offset,
>                               op->qiov.size, &op->qiov, 0);
>          mirror_write_complete(op, ret);
>      }
>      aio_context_release(blk_get_aio_context(s->common.blk));
>  }
> 
>  Line 232 if blk_co_pwritev. So, we acquired aio context and go to
>  write, which may yield, which makes the dead-lock possible.
> 
>  And bisect points to
>  commit 2e1990b26e5aa1ba1a730aa6281df123bb7a71b6 (refs/bisect/bad)
>  Author: Max Reitz <address@hidden>
>  Date:   Wed Jun 13 20:18:11 2018 +0200
> 
>      block/mirror: Convert to coroutines
> 
>  So, yes, 3.0 has the bug.
> 
>  Interesting thing is that pair of aio_context_acquire / release exists
>  even before that patch, and it was added in 2.9:
>  commit b9e413dd3756f71abe4e8cafe1d7a459ce74ccf4
>  Author: Paolo Bonzini <address@hidden>
>  Date:   Mon Feb 13 14:52:32 2017 +0100
> 
>      block: explicitly acquire aiocontext in aio callbacks that need it
> 
>  In 2.9 mirror_read_complete was a callback for blk_aio_preadv. But
>  honestly, I don't understand why it needed to acquire the context: it
>  is inserted to this context as a BH, why it need to additionally
>  acquire the context?

BHs don't automatically take the AioContext lock of the AioContext from
which they are run.

>  Anyway, after 2e1990b26e5 mirror_read/write_complete are not
>  callbacks, so, even if b9e413dd3 was OK, it doesn't relate to this
>  place now.

I guess in this specific instance we can just remove the locking from
mirror_read/write_complete becasue it's only locking an AioContext whose
lock we already hold anyway because we're in a Coroutine owned by the
AioContext, so yielding will release the lock and avoid the deadlock.

But more generally, does this mean that it is never correct to have an
explicit aio_context_acquire/release() pair in coroutine context, at
least for all practical cases where you call things that could yield?
And if you're inside a coroutine and want to access something in a
different AioContext, you need to drop out of the coroutine first?

If so, should aio_context_acquire() then assert that it's not called
from coroutine context?

Paolo, will this restriction of coroutines be solved for good when you
get your patches merged that remove the AioContext lock?

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]