qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error


From: Ladi Prosek
Subject: Re: [Qemu-devel] Guest unresponsive after Virtqueue size exceeded error
Date: Fri, 16 Jun 2017 08:58:17 +0200

Hi,

On Wed, Jun 14, 2017 at 11:56 PM, Fernando Casas Schössow
<address@hidden> wrote:
> Hi there,
>
> I recently migrated a Hyper-V host to qemu/kvm runing on Alpine Linux 3.6.1 
> (kernel 4.9.30 -with grsec patches- and qemu 2.8.1).
>
> Almost on daily basis at least one of the guests is showing the following 
> error in the log and the it needs to be terminated and restarted to recover 
> it:
>
> qemu-system-x86_64: Virtqueue size exceeded
>
> Is not always the same guest, and the error is appearing for both, Linux 
> (CentOS 7.3) and Windows (2012R2) guests.
> As soon as this error appears the guest is not really working anymore. It may 
> respond to ping or you can even try to login but then everything is very slow 
> or completely unresponsive. Restarting the guest from within the guest OS is 
> not working either and the only thing I can do is to terminate it (virsh 
> destroy) and start it again until the next failure.
>
> In Windows guest the error seems to be related to disk:
> "Reset to device, \Device\RaidPort2, was issued" and the source is viostor
>
> And in Linux guests the error is always (with the process and pid changing):
>
> INFO: task <process>:<pid> blocked for more than 120 seconds
>
> But unfortunately I was not able to find any other indication of a problem in 
> the guests logs nor in the host logs except for the error regarding the 
> virtqueue size. The problem is happening at different times of day and I 
> couldn't find any patterns yet.
>
> All the Windows guests are using virtio drivers version 126 and all Linux 
> guests are CentOS 7.3 using the latest kernel available in the distribution 
> (3.10.0-514.21.1). They all run qemu-guest agent as well.
> All the guest disks are qcow2 images with cache=none and aimode=threads 
> (tried native mode before but with the same results).
>
> Example qemu command for a Linux guest:
>
> /usr/bin/qemu-system-x86_64 -name guest=DOCKER01,debug-threads=on -S -object 
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-24-DOCKER01/master-key.aes
>  -machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu 
> IvyBridge,+ds,+acpi,+ss,+ht,+tm,+pbe,+dtes64,+monitor,+ds_cpl,+vmx,+smx,+est,+tm2,+xtpr,+pdcm,+pcid,+osxsave,+arat,+xsaveopt
>  -drive 
> file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on
>  -drive 
> file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1 
> -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 
> 4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev 
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-24-DOCKER01/monitor.sock,server,nowait
>  -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew 
> -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global 
> PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device 
> ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device 
> ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4
>  -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 
> -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 
> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive 
> file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=threads
>  -device 
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
>  -netdev tap,fd=35,id=hostnet0,vhost=on,vhostfd=45 -device 
> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3
>  -chardev pty,id=charserial0 -device 
> isa-serial,chardev=charserial0,id=serial0 -chardev 
> socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-24-DOCKER01/org.qemu.guest_agent.0,server,nowait
>  -device 
> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
>  -chardev spicevmc,id=charchannel1,name=vdagent -device 
> virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
>  -device usb-tablet,id=input0,bus=usb.0,port=1 -spice 
> port=5905,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device 
> qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2
>  -chardev spicevmc,id=charredir0,name=usbredir -device 
> usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev 
> spicevmc,id=charredir1,name=usbredir -device 
> usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device 
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object 
> rng-random,id=objrng0,filename=/dev/random -device 
> virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -msg timestamp=on
>
> For what it worth, the same guests were working fine for years on Hyper-V on 
> the same hardware (Intel Xeon E3, 32GB RAM, Supermicro mainboard, 6x3TB 
> Western Digital Red disks and 6x120MB Kingston V300 SSD all connected to a 
> LSI LSISAS2008 controller).
> Except for this stability issue that I hope to solve everything else is 
> working great and outperforming Hyper-V.
>
> Any ideas, thoughts or suggestions to try to narrow down the problem?

Would you be able to enhance the error message and rebuild QEMU?

--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -856,7 +856,7 @@ void *virtqueue_pop(VirtQueue *vq, size_t sz)
     max = vq->vring.num;

     if (vq->inuse >= vq->vring.num) {
-        virtio_error(vdev, "Virtqueue size exceeded");
+        virtio_error(vdev, "Virtqueue %u device %s size exceeded",
vq->queue_index, vdev->name);
         goto done;
     }

This would at least confirm the theory that it's caused by virtio-blk-pci.

If rebuilding is not feasible I would start by removing other virtio
devices -- particularly balloon which has had quite a few virtio
related bugs fixed recently.

Does your environment involve VM migrations or saving/resuming, or
does the crashing QEMU process always run the VM from its boot?

Thanks!

> Thanks in advance and sorry for the long email but I wanted to be as 
> descriptive as possible.
>
> Fer



reply via email to

[Prev in Thread] Current Thread [Next in Thread]