[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error
From: |
Ladi Prosek |
Subject: |
Re: [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error |
Date: |
Fri, 16 Jun 2017 08:58:17 +0200 |
Hi,
On Wed, Jun 14, 2017 at 11:56 PM, Fernando Casas Schössow
<address@hidden> wrote:
> Hi there,
>
> I recently migrated a Hyper-V host to qemu/kvm runing on Alpine Linux 3.6.1
> (kernel 4.9.30 -with grsec patches- and qemu 2.8.1).
>
> Almost on daily basis at least one of the guests is showing the following
> error in the log and the it needs to be terminated and restarted to recover
> it:
>
> qemu-system-x86_64: Virtqueue size exceeded
>
> Is not always the same guest, and the error is appearing for both, Linux
> (CentOS 7.3) and Windows (2012R2) guests.
> As soon as this error appears the guest is not really working anymore. It may
> respond to ping or you can even try to login but then everything is very slow
> or completely unresponsive. Restarting the guest from within the guest OS is
> not working either and the only thing I can do is to terminate it (virsh
> destroy) and start it again until the next failure.
>
> In Windows guest the error seems to be related to disk:
> "Reset to device, \Device\RaidPort2, was issued" and the source is viostor
>
> And in Linux guests the error is always (with the process and pid changing):
>
> INFO: task <process>:<pid> blocked for more than 120 seconds
>
> But unfortunately I was not able to find any other indication of a problem in
> the guests logs nor in the host logs except for the error regarding the
> virtqueue size. The problem is happening at different times of day and I
> couldn't find any patterns yet.
>
> All the Windows guests are using virtio drivers version 126 and all Linux
> guests are CentOS 7.3 using the latest kernel available in the distribution
> (3.10.0-514.21.1). They all run qemu-guest agent as well.
> All the guest disks are qcow2 images with cache=none and aimode=threads
> (tried native mode before but with the same results).
>
> Example qemu command for a Linux guest:
>
> /usr/bin/qemu-system-x86_64 -name guest=DOCKER01,debug-threads=on -S -object
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-24-DOCKER01/master-key.aes
> -machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu
> IvyBridge,+ds,+acpi,+ss,+ht,+tm,+pbe,+dtes64,+monitor,+ds_cpl,+vmx,+smx,+est,+tm2,+xtpr,+pdcm,+pcid,+osxsave,+arat,+xsaveopt
> -drive
> file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on
> -drive
> file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1
> -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid
> 4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-24-DOCKER01/monitor.sock,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew
> -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global
> PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device
> ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device
> ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4
> -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1
> -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2
> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive
> file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=threads
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
> -netdev tap,fd=35,id=hostnet0,vhost=on,vhostfd=45 -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3
> -chardev pty,id=charserial0 -device
> isa-serial,chardev=charserial0,id=serial0 -chardev
> socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-24-DOCKER01/org.qemu.guest_agent.0,server,nowait
> -device
> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
> -chardev spicevmc,id=charchannel1,name=vdagent -device
> virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
> -device usb-tablet,id=input0,bus=usb.0,port=1 -spice
> port=5905,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device
> qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2
> -chardev spicevmc,id=charredir0,name=usbredir -device
> usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev
> spicevmc,id=charredir1,name=usbredir -device
> usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object
> rng-random,id=objrng0,filename=/dev/random -device
> virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -msg timestamp=on
>
> For what it worth, the same guests were working fine for years on Hyper-V on
> the same hardware (Intel Xeon E3, 32GB RAM, Supermicro mainboard, 6x3TB
> Western Digital Red disks and 6x120MB Kingston V300 SSD all connected to a
> LSI LSISAS2008 controller).
> Except for this stability issue that I hope to solve everything else is
> working great and outperforming Hyper-V.
>
> Any ideas, thoughts or suggestions to try to narrow down the problem?
Would you be able to enhance the error message and rebuild QEMU?
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -856,7 +856,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
max = vq->vring.num;
if (vq->inuse >= vq->vring.num) {
- virtio_error(vdev, "Virtqueue size exceeded");
+ virtio_error(vdev, "Virtqueue %u device %s size exceeded",
vq->queue_index, vdev->name);
goto done;
}
This would at least confirm the theory that it's caused by virtio-blk-pci.
If rebuilding is not feasible I would start by removing other virtio
devices -- particularly balloon which has had quite a few virtio
related bugs fixed recently.
Does your environment involve VM migrations or saving/resuming, or
does the crashing QEMU process always run the VM from its boot?
Thanks!
> Thanks in advance and sorry for the long email but I wanted to be as
> descriptive as possible.
>
> Fer
- [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2017/06/15
- Re: [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error,
Ladi Prosek <=
- Re: [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2017/06/16
- Re: [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error, Ladi Prosek, 2017/06/16
- Re: [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2017/06/19
- Re: [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error, Ladi Prosek, 2017/06/20
- Re: [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2017/06/20
- Re: [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error, Ladi Prosek, 2017/06/20
- Re: [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2017/06/21
- Re: [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error, Ladi Prosek, 2017/06/22
- Re: [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2017/06/23
- Message not available
- Re: [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error, Fernando Casas Schössow, 2017/06/24