[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size ex

From: Fernando Casas Schössow
Subject: Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error
Date: Tue, 19 Feb 2019 07:26:51 +0000

Problem reproduced with virtio-scsi as well on the same guest, this time it 
took less than a day.
Information from the log file:

vdev 0x55823f119f90 ("virtio-scsi")
vq 0x55823f122e80 (idx 2)
inuse 128 vring.num 128
old_shadow_avail_idx 58367 last_avail_idx 58113 avail_idx 58367
avail 0x3de8a800 avail_idx (cache bypassed) 58113
used_idx 57985
2019-02-19 04:20:43.291+0000: shutting down, reason=crashed

Got the dump file as well, including guest memory. Size is around 486MB after 
Is there any other information I should collect to progress the investigation?


On lun, feb 18, 2019 at 8:21 AM, Fernando Casas Schössow <address@hidden> wrote:
It took a few days but last night the problem was reproduced.
This is the information from the log:

vdev 0x55f261d940f0 ("virtio-blk")
vq 0x55f261d9ee40 (idx 0)
inuse 128 vring.num 128
old_shadow_avail_idx 58874 last_avail_idx 58625 avail_idx 58874
avail 0x3d87a800 avail_idx (cache bypassed) 58625
used_idx 58497
2019-02-18 03:20:08.605+0000: shutting down, reason=crashed

The dump file, including guest memory, was generated successfully (after gzip 
the file is around 492MB).
I switched the guest now to virtio-scsi to get the information and dump with 
this setup as well.

How should we proceed?


On lun, feb 11, 2019 at 4:17 AM, Stefan Hajnoczi <address@hidden> wrote:
Thanks for collecting the data! The fact that both virtio-blk and virtio-scsi 
failed suggests it's not a virtqueue element leak in the virtio-blk or 
virtio-scsi device emulation code. The hung task error messages from inside the 
guest are a consequence of QEMU hitting the "Virtqueue size exceeded" error. 
QEMU refuses to process further requests after the error, causing tasks inside 
the guest to get stuck on I/O. I don't have a good theory regarding the root 
cause. Two ideas: 1. The guest is corrupting the vring or submitting more 
requests than will fit into the ring. Somewhat unlikely because it happens with 
both Windows and Linux guests. 2. QEMU's virtqueue code is buggy, maybe the 
memory region cache which is used for fast guest RAM accesses. Here is an 
expanded version of the debug patch which might help identify which of these 
scenarios is likely. Sorry, it requires running the guest again! This time 
let's make QEMU dump core so both QEMU state and guest RAM are captured for 
further debugging. That way it will be possible to extract more information 
using gdb without rerunning. Stefan --- diff --git a/hw/virtio/virtio.c 
b/hw/virtio/virtio.c index a1ff647a66..28d89fcbcb 100644 --- 
a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -866,6 +866,7 @@ void 
*virtqueue_pop(VirtQueue *vq, size_t sz) return NULL; } rcu_read_lock(); + 
uint16_t old_shadow_avail_idx = vq->shadow_avail_idx; if 
(virtio_queue_empty_rcu(vq)) { goto done; } @@ -879,6 +880,12 @@ void 
*virtqueue_pop(VirtQueue *vq, size_t sz) max = vq->vring.num; if (vq->inuse >= 
vq->vring.num) { + fprintf(stderr, "vdev %p (\"%s\")\n", vdev, vdev->name); + 
fprintf(stderr, "vq %p (idx %u)\n", vq, (unsigned int)(vq - vdev->vq)); + 
fprintf(stderr, "inuse %u vring.num %u\n", vq->inuse, vq->vring.num); + 
fprintf(stderr, "old_shadow_avail_idx %u last_avail_idx %u avail_idx %u\n", 
old_shadow_avail_idx, vq->last_avail_idx, vq->shadow_avail_idx); + 
fprintf(stderr, "avail %#" HWADDR_PRIx " avail_idx (cache bypassed) %u\n", 
vq->vring.avail, virtio_lduw_phys(vdev, vq->vring.avail + offsetof(VRingAvail, 
idx))); + fprintf(stderr, "used_idx %u\n", vq->used_idx); + abort(); /* <--- 
core dump! */ virtio_error(vdev, "Virtqueue size exceeded"); goto done; }

reply via email to

[Prev in Thread] Current Thread [Next in Thread]