[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Intermittent hang of iotest 194 (bdrv_drain_all after n
From: |
Fam Zheng |
Subject: |
Re: [Qemu-devel] Intermittent hang of iotest 194 (bdrv_drain_all after non-shared storage migration) |
Date: |
Fri, 10 Nov 2017 10:36:04 +0800 |
User-agent: |
Mutt/1.9.1 (2017-09-22) |
On Thu, 11/09 20:31, Max Reitz wrote:
> On 2017-11-09 16:30, Fam Zheng wrote:
> > On Thu, 11/09 16:14, Max Reitz wrote:
> >> On 2017-11-09 05:21, Fam Zheng wrote:
> >>> On Thu, 11/09 01:48, Max Reitz wrote:
> >>>> Hi,
> >>>>
> >>>> More exciting news from the bdrv_drain() front!
> >>>>
> >>>> I've noticed in the past that iotest 194 sometimes hangs. I usually run
> >>>> the tests on tmpfs, but I've just now verified that it happens on my SSD
> >>>> just as well.
> >>>>
> >>>> So the reproducer is a plain:
> >>>>
> >>>> while ./check -raw 194; do; done
> >>>
> >>> I cannot produce it on my machine.
> >>
> >> Hm, too bad. I see it both on my work laptop (with Fedora) and my
> >> desktop (with Arch)...
> >>
> >>>> (No difference between raw or qcow2, though.)
> >>>>
> >>>> And then, after a couple of runs (or a couple ten), it will just hang.
> >>>> The reason is that the source VM lingers around and doesn't quit
> >>>> voluntarily -- the test itself was successful, but it just can't exit.
> >>>>
> >>>> If you force it to exit by killing the VM (e.g. through pkill -11 qemu),
> >>>> this is the backtrace:
> >>>>
> >>>> #0 0x00007f7cfc297e06 in ppoll () at /lib64/libc.so.6
> >>>> #1 0x0000563b846bcac9 in ppoll (__ss=0x0, __timeout=0x0,
> >>>> __nfds=<optimized out>, __fds=<optimized out>) at
> >>>> /usr/include/bits/poll2.h:77
> >>>
> >>> Looking at the 0 timeout it seems we are in the aio_poll(ctx,
> >>> blocking=false);
> >>> branches of BDRV_POLL_WHILE? Is it a busy loop? If so I wonder what is
> >>> making
> >>> progress and causing the return value to be true in aio_poll().
> >>>
> >>>> #2 0x0000563b846bcac9 in qemu_poll_ns (fds=<optimized out>,
> >>>> nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:322
> >>>> #3 0x0000563b846be711 in aio_poll (address@hidden,
> >>>> blocking=<optimized out>) at util/aio-posix.c:629
> >>>> #4 0x0000563b8463afa4 in bdrv_drain_recurse
> >>>> (address@hidden, address@hidden) at block/io.c:201
> >>>> #5 0x0000563b8463baff in bdrv_drain_all_begin () at block/io.c:381
> >>>> #6 0x0000563b8463bc99 in bdrv_drain_all () at block/io.c:411
> >>>> #7 0x0000563b8459888b in block_migration_cleanup (opaque=<optimized
> >>>> out>) at migration/block.c:714
> >>>> #8 0x0000563b845883be in qemu_savevm_state_cleanup () at
> >>>> migration/savevm.c:1251
> >>>> #9 0x0000563b845811fd in migration_thread (opaque=0x563b856f1da0) at
> >>>> migration/migration.c:2298
> >>>> #10 0x00007f7cfc56f36d in start_thread () at /lib64/libpthread.so.0
> >>>> #11 0x00007f7cfc2a3e1f in clone () at /lib64/libc.so.6
> >>>>
> >>>>
> >>>> And when you make bdrv_drain_all_begin() print what we are trying to
> >>>> drain, you can see that it's the format node (managed by the "raw"
> >>>> driver in this case).
> >>>
> >>> So what is the value of bs->in_flight?
> >>
> >> gdb:
> >>> (gdb) print bs->in_flight
> >>> $3 = 2307492233
> >>
> >> "That's weird, why would it..."
> >>
> >>> (gdb) print *bs
> >>> $4 = {open_flags = -1202160144, read_only = 161, encrypted = 85, sg =
> >>> false, probed = false, force_share = 96, implicit = 159, drv = 0x0,
> >>> opaque = 0x0, aio_context = 0x8989898989898989, aio_notifiers = {lh_first
> >>> = 0x8989898989898989},
> >>> walking_aio_notifiers = 137, filename = '\211' <repeats 4096 times>,
> >>> backing_file = '\211' <repeats 4096 times>, backing_format = '\211'
> >>> <repeats 16 times>, full_open_options = 0x8989898989898989,
> >>> exact_filename = '\211' <repeats 4096 times>, backing =
> >>> 0x8989898989898989, file = 0x8989898989898989, bl = {request_alignment =
> >>> 2307492233, max_pdiscard = -1987475063, pdiscard_alignment = 2307492233,
> >>> max_pwrite_zeroes = -1987475063, pwrite_zeroes_alignment =
> >>> 2307492233, opt_transfer = 2307492233, max_transfer = 2307492233,
> >>> min_mem_alignment = 9910603678816504201, opt_mem_alignment =
> >>> 9910603678816504201, max_iov = -1987475063},
> >>> supported_write_flags = 2307492233, supported_zero_flags = 2307492233,
> >>> node_name = '\211' <repeats 32 times>, node_list = {tqe_next =
> >>> 0x8989898989898989, tqe_prev = 0x8989898989898989}, bs_list = {tqe_next =
> >>> 0x8989898989898989,
> >>> tqe_prev = 0x8989898989898989}, monitor_list = {tqe_next =
> >>> 0x8989898989898989, tqe_prev = 0x8989898989898989}, refcnt = -1987475063,
> >>> op_blockers = {{lh_first = 0x8989898989898989} <repeats 16 times>}, job =
> >>> 0x8989898989898989,
> >>> inherits_from = 0x8989898989898989, children = {lh_first =
> >>> 0x8989898989898989}, parents = {lh_first = 0x8989898989898989}, options =
> >>> 0x8989898989898989, explicit_options = 0x8989898989898989, detect_zeroes
> >>> = 2307492233,
> >>> backing_blocker = 0x8989898989898989, total_sectors =
> >>> -8536140394893047415, before_write_notifiers = {notifiers = {lh_first =
> >>> 0x8989898989898989}}, write_threshold_offset = 9910603678816504201,
> >>> write_threshold_notifier = {notify =
> >>> 0x8989898989898989, node = {le_next = 0x8989898989898989, le_prev =
> >>> 0x8989898989898989}}, dirty_bitmap_mutex = {lock = {__data = {__lock =
> >>> -1987475063, __count = 2307492233, __owner = -1987475063, __nusers =
> >>> 2307492233,
> >>> __kind = -1987475063, __spins = -30327, __elision = -30327,
> >>> __list = {__prev = 0x8989898989898989, __next = 0x8989898989898989}},
> >>> __size = '\211' <repeats 40 times>, __align = -8536140394893047415},
> >>> initialized = 137},
> >>> dirty_bitmaps = {lh_first = 0x8989898989898989}, wr_highest_offset =
> >>> {value = 9910603678816504201}, copy_on_read = -1987475063, in_flight =
> >>> 2307492233, serialising_in_flight = 2307492233, wakeup = 137, io_plugged
> >>> = 2307492233,
> >>> enable_write_cache = -1987475063, quiesce_counter = -1987475063,
> >>> write_gen = 2307492233, reqs_lock = {locked = 2307492233, ctx =
> >>> 0x8989898989898989, from_push = {slh_first = 0x8989898989898989}, to_pop
> >>> = {
> >>> slh_first = 0x8989898989898989}, handoff = 2307492233, sequence =
> >>> 2307492233, holder = 0x8989898989898989}, tracked_requests = {lh_first =
> >>> 0x8989898989898989}, flush_queue = {entries = {sqh_first =
> >>> 0x8989898989898989,
> >>> sqh_last = 0x8989898989898989}}, active_flush_req = 160,
> >>> flushed_gen = 0}
> >> "Oh."
> >>
> >> *sigh*
> >>
> >> OK, I'll look into it...
> >
> > OK, I'll let you.. Just one more thing: could it relate to the
> > use-after-free
> > bug reported on block_job_defer_to_main_loop()?
> >
> > https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01144.html
>
> Thanks for the heads-up; I think it's a different issue, though.
>
> What appears to be happening is that the mirror job completes and then
> drains its BDS. While that is happening, a bdrv_drain_all() comes in
> from block_migration_cleanup().
>
> That now tries to drain the mirror node. However, that node cannot be
> drained until the job is truly gone now, so that is what's happening:
> mirror_exit() is called, it cleans up, destroys the mirror node, and
> returns.
>
> Now bdrv_drain_all() can go on, specifically the BDRV_POLL_WHILE() on
> the mirror node. However, oops, that node is gone now... So that's
> where the issue seems to be. :-/
>
> Maybe all that we need to do is wrap the bdrv_drain_recurse() call in
> bdrv_drain_all_begin() in a bdrv_ref()/bdrv_unref() pair? Having run
> 194 for a couple of minutes, that seems to indeed work -- until it dies
> because of an invalid BB pointer in bdrv_next(). I guess that is
> because bdrv_next() does not guard against deleted BDSs.
>
> Copying all BDS into an own list (in both bdrv_drain_all_begin() and
> bdrv_drain_all_end()), with a strong reference to every single one, and
> then draining them really seems to work, though. (Survived 9000
> iterations, that seems good enough for something that usually fails
> after, like, 5.)
Yes, that makes sense. I'm curious if the patch in
https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01649.html
would also work?
Fam