qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size ex


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [Qemu-block] Guest unresponsive after Virtqueue size exceeded error
Date: Fri, 22 Feb 2019 16:37:36 +0000
User-agent: Mutt/1.10.1 (2018-07-13)

* Stefan Hajnoczi (address@hidden) wrote:
> On Fri, Feb 22, 2019 at 12:57 PM Fernando Casas Schössow
> <address@hidden> wrote:
> 
> I have CCed Natanael Copa, qemu package maintainer in Alpine Linux.
> 
> Fernando: Can you confirm that the bug occurs with an unmodified
> Alpine Linux qemu binary?
> 
> Richard: Commit 7db2145a6826b14efceb8dd64bfe6ad8647072eb ("bswap: Add
> host endian unaligned access functions") introduced the unaligned
> memory access functions in question here.  Please see below for
> details on the bug - basically QEMU code assumes they are atomic, but
> that is not guaranteed :(.  Any ideas for how to fix this?


Why does vring_avail_idx use virtio_ld*u*w_phys_cached?
(similar for vring_avail_ring).
There's no generically safe way to do unaligned atomic loads - don't we know
in virtio that these are naturally aligned?

Dave

> Natanael: It seems likely that the qemu package in Alpine Linux
> suffers from a compilation issue resulting in a broken QEMU.  It may
> be necessary to leave the compiler optimization flag alone in APKBUILD
> to work around this problem.
> 
> Here are the details.  QEMU relies on the compiler turning
> memcpy(&dst, &src, 2) turning into a load instruction in
> include/qemu/bswap.h:lduw_he_p() (explanation below):
> 
> /* Any compiler worth its salt will turn these memcpy into native unaligned
>    operations.  Thus we don't need to play games with packed attributes, or
>    inline byte-by-byte stores.  */
> 
> static inline int lduw_he_p(const void *ptr)
> {
>     uint16_t r;
>     memcpy(&r, ptr, sizeof(r));
>     return r;
> }
> 
> Here is the disassembly snippet of virtqueue_pop() from Fedora 29 that
> shows the load instruction:
> 
>   398166:       0f b7 42 02             movzwl 0x2(%rdx),%eax
>   39816a:       66 89 43 32             mov    %ax,0x32(%rbx)
> 
> Here is the instruction sequence in the Alpine Linux binary:
> 
>   455562:    ba 02 00 00 00           mov    $0x2,%edx
>   455567:    e8 74 24 f3 ff           callq  3879e0 <address@hidden>
>   45556c:    0f b7 44 24 42           movzwl 0x42(%rsp),%eax
>   455571:    66 41 89 47 32           mov    %ax,0x32(%r15)
> 
> It's calling memcpy instead of using a load instruction.
> 
> Fernando found that QEMU's virtqueue_pop() function sees bogus values
> when loading a 16-bit guest RAM location.  Paolo figured out that the
> bogus value can be produced by memcpy() when another thread is
> updating the 16-bit memory location simultaneously.  This is a race
> condition between one thread loading the 16-bit value and another
> thread storing it (in this case a guest vcpu thread).  Sometimes
> memcpy() may load one old byte and one new byte, resulting in a bogus
> value.
> 
> The symptom that Fernando experienced is a "Virtqueue size exceeded"
> error message from QEMU and then the virtio-blk or virtio-scsi device
> stops working.  This issue potentially affects other device emulation
> code in QEMU as well, not just virtio devices.
> 
> For the time being, I suggest tweaking the APKBUILD so the memcpy() is
> not generated.  Hopefully QEMU can make the code more portable in the
> future so the compiler always does the expected thing, but this may
> not be easily possible.
> 
> Stefan
> 
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]