[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bdrv_drained_begin deadlock with io-threads
From: |
Dietmar Maurer |
Subject: |
Re: bdrv_drained_begin deadlock with io-threads |
Date: |
Tue, 31 Mar 2020 17:24:22 +0200 (CEST) |
> > How can I see/debug those waiting request?
>
> Examine bs->tracked_requests list.
>
> BdrvTrackedRequest has "Coroutine *co" field. It's a pointer of coroutine of
> this request. You may use qemu-gdb script to print request's coroutine
> back-trace:
I would, but there are no tracked request at all.
print bs->tracked_requests
$2 = {lh_first = 0x0}
> gdb> source qemu_source/scripts/qemu-gdb.py
>
> gdb> qemu coroutine CO_POINTER
>
> - this will show, what exactly the request is doing now/waiting for.
(gdb) up
#1 0x0000555555c60489 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized
out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
77 return __ppoll_alias (__fds, __nfds, __timeout, __ss);
(gdb) up
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>,
timeout=timeout@entry=-1) at util/qemu-timer.c:335
335 return ppoll((struct pollfd *)fds, nfds, NULL, NULL);
(gdb) up
#3 0x0000555555c62c71 in fdmon_poll_wait (ctx=0x7fffe8905e80,
ready_list=0x7fffffffdce8, timeout=-1) at util/fdmon-poll.c:79
79 ret = qemu_poll_ns(pollfds, npfd, timeout);
(gdb) up
#4 0x0000555555c62257 in aio_poll (ctx=0x7fffe8905e80,
blocking=blocking@entry=true) at util/aio-posix.c:589
589 ret = ctx->fdmon_ops->wait(ctx, &ready_list, timeout);
(gdb) up
#5 0x0000555555bc25e5 in bdrv_do_drained_begin (poll=<optimized out>,
ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x7fffe8954bc0)
at block/io.c:430
430 BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive,
parent));
(gdb) print *bs
$1 = {open_flags = 139426, read_only = false, encrypted = false, sg = false,
probed = false, force_share = false, implicit = false,
drv = 0x555556115080 <bdrv_raw>, opaque = 0x7fffe8918400, aio_context =
0x7fffe8915180, aio_notifiers = {lh_first = 0x0}, walking_aio_notifiers =
false,
filename = "/dev/pve/vm-101-disk-0", '\000' <repeats 4073 times>,
backing_file = '\000' <repeats 4095 times>,
auto_backing_file = '\000' <repeats 4095 times>, backing_format = '\000'
<repeats 15 times>, full_open_options = 0x7fffe562c000,
exact_filename = "/dev/pve/vm-101-disk-0", '\000' <repeats 4073 times>,
backing = 0x0, file = 0x7fffe88e9b60, bl = {request_alignment = 1,
max_pdiscard = 0, pdiscard_alignment = 0, max_pwrite_zeroes = 0,
pwrite_zeroes_alignment = 0, opt_transfer = 0, max_transfer = 0,
min_mem_alignment = 512, opt_mem_alignment = 4096, max_iov = 1024},
supported_write_flags = 64, supported_zero_flags = 324,
node_name = "#block163", '\000' <repeats 22 times>, node_list = {tqe_next =
0x7fffe8975180, tqe_circ = {tql_next = 0x7fffe8975180,
tql_prev = 0x7fffe8963540}}, bs_list = {tqe_next = 0x7fffe895f480,
tqe_circ = {tql_next = 0x7fffe895f480,
tql_prev = 0x555556114f10 <all_bdrv_states>}}, monitor_list = {tqe_next =
0x0, tqe_circ = {tql_next = 0x0, tql_prev = 0x0}}, refcnt = 2, op_blockers = {
{lh_first = 0x0} <repeats 16 times>}, inherits_from = 0x0, children =
{lh_first = 0x7fffe88e9b60}, parents = {lh_first = 0x7fffe88ea180},
options = 0x7fffe8933400, explicit_options = 0x7fffe8934800, detect_zeroes =
BLOCKDEV_DETECT_ZEROES_OPTIONS_ON, backing_blocker = 0x0,
total_sectors = 67108864, before_write_notifiers = {notifiers = {lh_first =
0x0}}, write_threshold_offset = 0, write_threshold_notifier = {notify = 0x0,
node = {le_next = 0x0, le_prev = 0x0}}, dirty_bitmap_mutex = {lock =
pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared =
No,
Protocol = None}, initialized = true}, dirty_bitmaps = {lh_first = 0x0},
wr_highest_offset = {value = 28049412096}, copy_on_read = 0, in_flight = 0,
serialising_in_flight = 0, io_plugged = 0, enable_write_cache = 0,
quiesce_counter = 1, recursive_quiesce_counter = 0, write_gen = 113581,
reqs_lock = {
locked = 0, ctx = 0x0, from_push = {slh_first = 0x0}, to_pop = {slh_first =
0x0}, handoff = 0, sequence = 0, holder = 0x0}, tracked_requests = {
lh_first = 0x0}, flush_queue = {entries = {sqh_first = 0x0, sqh_last =
0x7fffe8958e38}}, active_flush_req = false, flushed_gen = 112020,
never_freeze = false}
Looks bdrv_parent_drained_poll_single() calls blk_root_drained_poll(), which
return true in my case (in_flight > 5). Looks like I am loosing poll events
somewhere?