[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Bug 1570134] Re: While committing snapshot qemu crashe

From: Max Reitz
Subject: Re: [Qemu-devel] [Bug 1570134] Re: While committing snapshot qemu crashes with SIGABRT
Date: Wed, 20 Apr 2016 20:09:55 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2

On 20.04.2016 02:03, Matthew Schumacher wrote:
> Max,
> Qemu still crashes for me, but the debug is again very different.  When
> I attach to the qemu process from gdb, it is unable to provide a
> backtrace when it crashes.  The log file is different too.  Any ideas?
> qemu-system-x86_64: block.c:2307: bdrv_replace_in_backing_chain:
> Assertion `!bdrv_requests_pending(old)' failed.

This message is exactly the same as you saw in 2.5.1, so I guess we've
at least averted a regression in 2.6.0.

I'm CC-ing some people who are more involved with this (although Paolo
is on PTO right now, but well...). (The following is more of a note to
those people than to you, Matthew.)

Summary: I think bdrv_drained_begin() does not behave as advertised.

So the assertion that is failing here asserts that no requests are
pending on the mirror block jobs source BDS. However, we do invoke a
bdrv_drained_begin() on exactly that BDS at the end of mirror_run().

When that function returns, there are indeed no more requests pending
for that BDS. But once mirror_exit() is invoked, there may be new
requests pending.

I reproduced that by running bonnie++ in a guest and then just committed
a snapshot and invoked block-job-complete right after the
BLOCK_JOB_READY event; sometimes, in bdrv_requests_pending(s->common.bs)
is true in mirror_exit() (which is bad), sometimes it's false. I just
used a plain virtio-blk drive without dataplane.

I'm not sure exactly how bdrv_drained_begin() and in turn
aio_disable_external() are supposed to work, but as a matter of fact a
BDS may receive requests even after those functions are called. Just
putting an assert(!bs->quiesce_counter) in tracked_request_begin() will
make it fail even before I started the mirror block job (due to some flush).

So in my case the problematic request regarding the mirroring comes from
blk_aio_ready_entry(); putting an assert(!blk_bs(blk)->quiesce_counter)
into blk_aio_readv() yields the following backtrace:

#0  0x00007f3e750bd2a8 in raise () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007f3e750be72a in abort () from /usr/lib/libc.so.6
No symbol table info available.
#2  0x00007f3e750b61b7 in __assert_fail_base () from /usr/lib/libc.so.6
No symbol table info available.
#3  0x00007f3e750b6262 in __assert_fail () from /usr/lib/libc.so.6
No symbol table info available.
#4  0x0000564cf7d4e25e in blk_aio_readv (blk=<optimized out>,
sector_num=<optimized out>, iov=<optimized out>, nb_sectors=<optimized
out>, cb=<optimized out>, opaque=<optimized out>) at
        __PRETTY_FUNCTION__ = "blk_aio_readv"
#5  0x0000564cf7ab2cf3 in submit_requests (niov=<optimized out>,
num_reqs=<optimized out>, start=<optimized out>, mrb=<optimized out>,
blk=<optimized out>) at qemu/hw/block/virtio-blk.c:361
        nb_sectors = <optimized out>
        is_write = <optimized out>
        qiov = <optimized out>
        sector_num = <optimized out>
#6  virtio_blk_submit_multireq (blk=0x564cf9f80250,
address@hidden) at qemu/hw/block/virtio-blk.c:391
        i = <optimized out>
        start = <optimized out>
        num_reqs = <optimized out>
        niov = <optimized out>
        nb_sectors = <optimized out>
        max_xfer_len = <optimized out>
        sector_num = <optimized out>
#7  0x0000564cf7ab38c2 in virtio_blk_handle_vq (s=0x564cf9e51268,
vq=<optimized out>) at qemu/hw/block/virtio-blk.c:593
        req = 0x0
        mrb = {reqs = {0x564cfb8e8c30, 0x564cfb7bc290, 0x0 <repeats 30
times>}, num_reqs = 2, is_write = false}
#8  0x0000564cf7addcf5 in virtio_queue_notify_vq (vq=0x564cfa000be0) at
        vdev = 0x564cf9e51268
#9  0x0000564cf7d19980 in aio_dispatch (ctx=0x564cf9e42f40) at
        tmp = <optimized out>
        revents = <optimized out>
        node = 0x7f3e54015030
        progress = false
#10 0x0000564cf7d0eecd in aio_ctx_dispatch (source=<optimized out>,
callback=<optimized out>, user_data=<optimized out>) at qemu/async.c:233
        ctx = <optimized out>
#11 0x00007f3e781d7f07 in g_main_context_dispatch () from
No symbol table info available.
#12 0x0000564cf7d1803b in glib_pollfds_poll () at qemu/main-loop.c:213
        context = 0x564cf9e44800
        pfds = <optimized out>
#13 os_host_main_loop_wait (timeout=<optimized out>) at qemu/main-loop.c:258
        ret = 2
        spin_counter = 2
#14 main_loop_wait (nonblocking=<optimized out>) at qemu/main-loop.c:506
        ret = 2
        timeout = 1000
        timeout_ns = <optimized out>
#15 0x0000564cf7a4c91c in main_loop () at qemu/vl.c:1934
        nonblocking = <optimized out>
        last_io = 0
#16 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized
out>) at qemu/vl.c:4658

Maybe bdrv_drained_begin() is supposed to work like this and to let this
request through but that would be pretty counter-intuitive.


Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]