qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bdrv_drained_begin deadlock with io-threads


From: Kevin Wolf
Subject: Re: bdrv_drained_begin deadlock with io-threads
Date: Wed, 1 Apr 2020 20:12:56 +0200
User-agent: Mutt/1.12.1 (2019-06-15)

Am 01.04.2020 um 17:37 hat Dietmar Maurer geschrieben:
> > > I really nobody else able to reproduce this (somebody already tried to 
> > > reproduce)?
> > 
> > I can get hangs, but that's for job_completed(), not for starting the
> > job. Also, my hangs have a non-empty bs->tracked_requests, so it looks
> > like a different case to me.
> 
> Please can you post the command line args of your VM? I use something like
> 
> ./x86_64-softmmu/qemu-system-x86_64 -chardev
> 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' -mon
> 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/101.pid  -m
> 1024 -object 'iothread,id=iothread-virtioscsi0' -device
> 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive
> 'file=/backup/disk3/debian-buster.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on'
> -device
> 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0'
> -machine "type=pc,accel=kvm"
> 
> Do you also run "stress-ng -d 5" indied the VM?

I'm not using the exact same test case, but something that I thought
would be similar enough. Specifically, I run the script below, which
boots from a RHEL 8 CD and in the rescue shell, I'll do 'dd if=/dev/zero
of=/dev/sda' while the script keeps starting and cancelling backup jobs
in the background.

Anyway, I finally managed to bisect my problem now (did it wrong the
first time) and got this result:

00e30f05de1d19586345ec373970ef4c192c6270 is the first bad commit
commit 00e30f05de1d19586345ec373970ef4c192c6270
Author: Vladimir Sementsov-Ogievskiy <address@hidden>
Date:   Tue Oct 1 16:14:09 2019 +0300

    block/backup: use backup-top instead of write notifiers

    Drop write notifiers and use filter node instead.

    = Changes =

    1. Add filter-node-name argument for backup qmp api. We have to do it
    in this commit, as 257 needs to be fixed.

    2. There are no more write notifiers here, so is_write_notifier
    parameter is dropped from block-copy paths.

    3. To sync with in-flight requests at job finish we now have drained
    removing of the filter, we don't need rw-lock.

    4. Block-copy is now using BdrvChildren instead of BlockBackends

    5. As backup-top owns these children, we also move block-copy state
    into backup-top's ownership.

    [...]


That's a pretty big change, and I'm not sure how it's related to
completed requests hanging in the thread pool instead of reentering the
file-posix coroutine. But I also tested it enough that I'm confident
it's really the first bad commit.

Maybe you want to try if your problem starts at the same commit?

Kevin


#!/bin/bash

qmp() {
cat <<EOF
{'execute':'qmp_capabilities'}
EOF

while true; do
cat <<EOF
{ "execute": "drive-backup", "arguments": {
  "job-id":"drive_image1","device": "drive_image1", "sync": "full", "target": 
"/tmp/backup.raw" } }
EOF
sleep 1
cat <<EOF
{ "execute": "block-job-cancel", "arguments": { "device": "drive_image1"} }
EOF
sleep 2
done
}

./qemu-img create -f qcow2 /tmp/test.qcow2 4G
for i in $(seq 0 1); do echo "write ${i}G 1G"; done | ./qemu-io /tmp/test.qcow2

qmp | x86_64-softmmu/qemu-system-x86_64 \
    -enable-kvm \
    -machine pc \
    -m 1G \
    -object 'iothread,id=iothread-virtioscsi0' \
    -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' \
    -blockdev node-name=my_drive,driver=file,filename=/tmp/test.qcow2 \
    -blockdev driver=qcow2,node-name=drive_image1,file=my_drive \
    -device scsi-hd,drive=drive_image1,id=image1 \
    -cdrom ~/images/iso/RHEL-8.0-20190116.1-x86_64-dvd1.iso \
    -boot d \
    -qmp stdio -monitor vc




reply via email to

[Prev in Thread] Current Thread [Next in Thread]