Re: deadlock when using iothread during backup

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: deadlock when using iothread during backup_clean()

From:	Vladimir Sementsov-Ogievskiy
Subject:	Re: deadlock when using iothread during backup_clean()
Date:	Wed, 4 Oct 2023 19:51:03 +0300
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1

On 05.09.23 14:25, Fiona Ebner wrote:

Am 05.09.23 um 12:01 schrieb Fiona Ebner:


Can we assume block_job_remove_all_bdrv() to always hold the job's
AioContext? And if yes, can we just tell bdrv_graph_wrlock() that it
needs to release that before polling to fix the deadlock?


I tried a doing something similar as a proof-of-concept

diff --git a/blockjob.c b/blockjob.c
index 58c5d64539..1a696241a0 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -198,19 +198,19 @@ void block_job_remove_all_bdrv(BlockJob *job)
       * one to make sure that such a concurrent access does not attempt
       * to process an already freed BdrvChild.
       */
-    bdrv_graph_wrlock(NULL);
      while (job->nodes) {
          GSList *l = job->nodes;
          BdrvChild *c = l->data;

job->nodes = l->next;+ bdrv_graph_wrlock(c->bs);

          bdrv_op_unblock_all(c->bs, job->blocker);
          bdrv_root_unref_child(c);
+        bdrv_graph_wrunlock();

g_slist_free_1(l);

      }
-    bdrv_graph_wrunlock();
  }


and while it did get slightly further, I ran into another deadlock with

#0  0x00007f1941155136 in __ppoll (fds=0x55992068fb20, nfds=2, timeout=<optimized 
out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42
#1  0x000055991c6a1a3f in qemu_poll_ns (fds=0x55992068fb20, nfds=2, timeout=-1) 
at ../util/qemu-timer.c:339
#2  0x000055991c67ed6c in fdmon_poll_wait (ctx=0x55991f058810, 
ready_list=0x7ffda8c987b0, timeout=-1) at ../util/fdmon-poll.c:79
#3  0x000055991c67e6a8 in aio_poll (ctx=0x55991f058810, blocking=true) at 
../util/aio-posix.c:670
#4  0x000055991c50a763 in bdrv_graph_wrlock (bs=0x0) at 
../block/graph-lock.c:145


Interesting, why in bdrv_close() we pass NULL to bdrv_graph_wrlock.. Shouldn't 
bdrv_close() be called with bs aio context locked?

#5  0x000055991c4daf85 in bdrv_close (bs=0x55991fff2f30) at ../block.c:5166
#6  0x000055991c4dc050 in bdrv_delete (bs=0x55991fff2f30) at ../block.c:5606
#7  0x000055991c4df205 in bdrv_unref (bs=0x55991fff2f30) at ../block.c:7173
#8  0x000055991c4fb8ca in bdrv_cbw_drop (bs=0x55991fff2f30) at 
../block/copy-before-write.c:566
#9  0x000055991c4f9685 in backup_clean (job=0x55992016d0b0) at 
../block/backup.c:105


--
Best regards,
Vladimir

[Prev in Thread]

Current Thread

[Next in Thread]

Re: deadlock when using iothread during backup_clean(), Vladimir Sementsov-Ogievskiy <=
- Re: deadlock when using iothread during backup_clean(), Vladimir Sementsov-Ogievskiy, 2023/10/04
  - Re: deadlock when using iothread during backup_clean(), Fiona Ebner, 2023/10/06
    - Re: deadlock when using iothread during backup_clean(), Fiona Ebner, 2023/10/17
    - Re: deadlock when using iothread during backup_clean(), Kevin Wolf, 2023/10/17
    - Re: deadlock when using iothread during backup_clean(), Fiona Ebner, 2023/10/17
    - Re: deadlock when using iothread during backup_clean(), Kevin Wolf, 2023/10/17
    - Re: deadlock when using iothread during backup_clean(), Fiona Ebner, 2023/10/18
    - Re: deadlock when using iothread during backup_clean(), Kevin Wolf, 2023/10/19
    - Re: deadlock when using iothread during backup_clean(), Fiona Ebner, 2023/10/19
    - Re: deadlock when using iothread during backup_clean(), Fiona Ebner, 2023/10/20

Prev by Date: Re: [PATCH 0/5] RFC: migration: check required entries and sections are loaded
Next by Date: Re: deadlock when using iothread during backup_clean()
Previous by thread: Re: [PATCH 0/5] RFC: migration: check required entries and sections are loaded
Next by thread: Re: deadlock when using iothread during backup_clean()
Index(es):
- Date
- Thread