qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()


From: John Snow
Subject: Re: [Qemu-devel] Regression from 2.8: stuck in bdrv_drain()
Date: Wed, 12 Apr 2017 17:38:17 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0


On 04/12/2017 04:46 PM, Jeff Cody wrote:
> 
> This occurs on v2.9.0-rc4, but not on v2.8.0.
> 
> When running QEMU with an iothread, and then performing a block-mirror, if
> we do a system-reset after the BLOCK_JOB_READY event has emitted, qemu
> becomes deadlocked.
> 
> The block job is not paused, nor cancelled, so we are stuck in the while
> loop in block_job_detach_aio_context:
> 
> static void block_job_detach_aio_context(void *opaque)
> {
>     BlockJob *job = opaque;
> 
>     /* In case the job terminates during aio_poll()... */
>     block_job_ref(job);
> 
>     block_job_pause(job);
> 
>     while (!job->paused && !job->completed) {
>         block_job_drain(job);
>     }
> 

Looks like when block_job_drain calls block_job_enter from this context
(the main thread, since we're trying to do a system_reset...), we cannot
enter the coroutine because it's the wrong context, so we schedule an
entry instead with

aio_co_schedule(ctx, co);

But that entry never happens, so the job never wakes up and we never
make enough progress in the coroutine to gracefully pause, so we wedge here.

>     block_job_unref(job);
> }
> 

> 
> Reproducer script and QAPI commands:
> 
> # QEMU script:
> gdb --args /home/user/deploy-${1}/bin/qemu-system-x86_64 -enable-kvm -smp 4 
> -object iothread,id=iothread0 -drive 
> file=${2},if=none,id=drive-virtio-disk0,aio=native,cache=none,discard=unmap  
> -device 
> virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,iothread=iothread0
>  -m 1024 -boot menu=on -qmp stdio -drive 
> file=${3},if=none,id=drive-data-disk0,format=qcow2,cache=none,aio=native,werror=stop,rerror=stop
>  -device 
> virtio-blk-pci,drive=drive-data-disk0,id=data-disk0,iothread=iothread0,bus=pci.0,addr=0x7
>  
> 
> 
> # QAPI commands:
> { "execute": "drive-mirror", "arguments": { "device": "drive-data-disk0", 
> "target": "/home/user/sn1", "format": "qcow2", "mode": "absolute-paths", 
> "sync": "full", "speed": 1000000000, "on-source-error": "stop", 
> "on-target-error": "stop" } }
> 
> 
> # after BLOCK_JOB_READY, do system reset
> { "execute": "system_reset" }
> 
> 
> 
> 
> 
> gbd bt:
> 
> (gdb) bt
> #0  0x0000555555aa79f3 in bdrv_drain_recurse (address@hidden) at 
> block/io.c:164
> #1  0x0000555555aa825d in bdrv_drained_begin (address@hidden) at 
> block/io.c:231
> #2  0x0000555555aa8449 in bdrv_drain (bs=0x55555783e900) at block/io.c:265
> #3  0x0000555555a9c356 in blk_drain (blk=<optimized out>) at 
> block/block-backend.c:1383
> #4  0x0000555555aa3cfd in mirror_drain (job=<optimized out>) at 
> block/mirror.c:1000
> #5  0x0000555555a66e11 in block_job_detach_aio_context 
> (opaque=0x555557a19a40) at blockjob.c:142
> #6  0x0000555555a62f4d in bdrv_detach_aio_context (address@hidden) at 
> block.c:4357
> #7  0x0000555555a63116 in bdrv_set_aio_context (address@hidden, 
> address@hidden) at block.c:4418
> #8  0x0000555555a9d326 in blk_set_aio_context (blk=0x5555566db520, 
> new_context=0x55555668bc20) at block/block-backend.c:1662
> #9  0x00005555557e38da in virtio_blk_data_plane_stop (vdev=<optimized out>) 
> at /home/jcody/work/upstream/qemu-kvm/hw/block/dataplane/virtio-blk.c:262
> #10 0x00005555559f9d5f in virtio_bus_stop_ioeventfd (address@hidden) at 
> hw/virtio/virtio-bus.c:246
> #11 0x00005555559fa49b in virtio_bus_stop_ioeventfd (address@hidden) at 
> hw/virtio/virtio-bus.c:238
> #12 0x00005555559f6a18 in virtio_pci_stop_ioeventfd (proxy=0x555558300510) at 
> hw/virtio/virtio-pci.c:348
> #13 0x00005555559f6a18 in virtio_pci_reset (qdev=<optimized out>) at 
> hw/virtio/virtio-pci.c:1872
> #14 0x00005555559139a9 in qdev_reset_one (dev=<optimized out>, 
> opaque=<optimized out>) at hw/core/qdev.c:310
> #15 0x0000555555916738 in qbus_walk_children (bus=0x55555693aa30, 
> pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, 
> post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/bus.c:59
> #16 0x0000555555913318 in qdev_walk_children (dev=0x5555569387d0, 
> pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, 
> post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/qdev.c:617
> #17 0x0000555555916738 in qbus_walk_children (bus=0x555556756f70, 
> pre_devfn=0x0, pre_busfn=0x0, post_devfn=0x5555559139a0 <qdev_reset_one>, 
> post_busfn=0x5555559120f0 <qbus_reset_one>, opaque=0x0) at hw/core/bus.c:59
> #18 0x00005555559168ca in qemu_devices_reset () at hw/core/reset.c:69
> #19 0x000055555581fcbb in pc_machine_reset () at 
> /home/jcody/work/upstream/qemu-kvm/hw/i386/pc.c:2234
> #20 0x00005555558a4d96 in qemu_system_reset (report=<optimized out>) at 
> vl.c:1697
> #21 0x000055555577157a in main_loop_should_exit () at vl.c:1865
> #22 0x000055555577157a in main_loop () at vl.c:1902
> #23 0x000055555577157a in main (argc=<optimized out>, argv=<optimized out>, 
> envp=<optimized out>) at vl.c:4709
> 
> 
> -Jeff
> 

Here's a backtrace for an unoptimized build showing all threads:

https://paste.fedoraproject.org/paste/lLnm8jKeq2wLKF6yEaoEM15M1UNdIGYhyRLivL9gydE=


--js



reply via email to

[Prev in Thread] Current Thread [Next in Thread]