[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCHv2 1/3] virtio: add missing mb() on notification
From: |
Michael S. Tsirkin |
Subject: |
Re: [Qemu-devel] [PATCHv2 1/3] virtio: add missing mb() on notification |
Date: |
Tue, 24 Apr 2012 17:42:58 +0300 |
On Tue, Apr 24, 2012 at 06:24:14PM +0400, malc wrote:
> On Tue, 24 Apr 2012, Paolo Bonzini wrote:
>
> > Il 24/04/2012 16:20, Michael S. Tsirkin ha scritto:
> > > On Tue, Apr 24, 2012 at 03:46:25PM +0200, Paolo Bonzini wrote:
> > >> Il 23/04/2012 15:19, Michael S. Tsirkin ha scritto:
> > >>> During normal operation, virtio first writes a used index
> > >>> and then checks whether it should interrupt the guest
> > >>> by reading guest avail index/flag values.
> > >>>
> > >>> Guest does the reverse: writes the index/flag,
> > >>> then checks the used ring.
> > >>>
> > >>> The ordering is important: if host avail flag read bypasses the used
> > >>> index write, we could in effect get this timing:
> > >>>
> > >>> host avail flag read
> > >>> guest enable interrupts: avail flag write
> > >>> guest check used ring: ring is empty
> > >>> host used index write
> > >>>
> > >>> which results in a lost interrupt: guest will never be notified
> > >>> about the used ring update.
> > >>>
> > >>> This actually can happen when using kvm with an io thread,
> > >>> such that the guest vcpu and qemu run on different host cpus,
> > >>> and this has actually been observed in the field
> > >>> (but only seems to trigger on very specific processor types)
> > >>> with userspace virtio: vhost has the necessary smp_mb()
> > >>> in place to prevent the regordering, so the same workload stalls
> > >>> forever waiting for an interrupt with vhost=off but works
> > >>> fine with vhost=on.
> > >>>
> > >>> Insert an smp_mb barrier operation in userspace virtio to
> > >>> ensure the correct ordering.
> > >>> Applying this patch fixed the race condition we have observed.
> > >>> Tested on x86_64. I checked the code generated by the new macro
> > >>> for i386 and ppc but didn't run virtio.
> > >>>
> > >>> Signed-off-by: Michael S. Tsirkin <address@hidden>
> > >>> ---
> > >>> hw/virtio.c | 2 ++
> > >>> qemu-barrier.h | 23 ++++++++++++++++++++---
> > >>> 2 files changed, 22 insertions(+), 3 deletions(-)
> > >>>
> > >>> diff --git a/hw/virtio.c b/hw/virtio.c
> > >>> index f805790..6449746 100644
> > >>> --- a/hw/virtio.c
> > >>> +++ b/hw/virtio.c
> > >>> @@ -693,6 +693,8 @@ static bool vring_notify(VirtIODevice *vdev,
> > >>> VirtQueue *vq)
> > >>> {
> > >>> uint16_t old, new;
> > >>> bool v;
> > >>> + /* We need to expose used array entries before checking used
> > >>> event. */
> > >>> + mb();
> > >>
> > >> mb() vs. smp_mb()?
> > >
> > > rhel used wmb() everywhere so this keeps it consistent.
> > > upstream switched to smp_wmb so I added smp_mb there.
> > >
> > >>> /* Always notify when queue is empty (when feature acknowledge) */
> > >>> if (((vdev->guest_features & (1 << VIRTIO_F_NOTIFY_ON_EMPTY)) &&
> > >>> !vq->inuse && vring_avail_idx(vq) == vq->last_avail_idx)) {
> > >>> diff --git a/qemu-barrier.h b/qemu-barrier.h
> > >>> index c11bb2b..f6722a8 100644
> > >>> --- a/qemu-barrier.h
> > >>> +++ b/qemu-barrier.h
> > >>> @@ -4,7 +4,7 @@
> > >>> /* Compiler barrier */
> > >>> #define barrier() asm volatile("" ::: "memory")
> > >>>
> > >>> -#if defined(__i386__) || defined(__x86_64__)
> > >>> +#if defined(__i386__)
> > >>>
> > >>> /*
> > >>> * Because of the strongly ordered x86 storage model, wmb() is a nop
> > >>> @@ -13,15 +13,31 @@
> > >>> * load/stores from C code.
> > >>> */
> > >>> #define smp_wmb() barrier()
> > >>> +/*
> > >>> + * We use GCC builtin if it's available, as that can use
> > >>> + * mfence on 32 bit as well, e.g. if built with -march=pentium-m.
> > >>> + * However, on i386, there seem to be known bugs as recently as 4.3.
> > >>> + * */
> > >>
> > >> Do you know what those bugs are? Either add a pointer, or there is no
> > >> reason to have cruft that is only backed by hearsay.
> > >
> > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793
> > > I'll add this link in the commit log.
> >
> > Ok, thanks. I modified my "atomics" branch to add it too.
> >
> > >>> +#if defined(_GNUC__) && __GNUC__ >= 4 && __GNUC_MINOR__ >= 4
>
> __GNUC__ perhaps?
Turns out gcc defines both. Why? I have no idea - probably
better to switch to __GNUC__... Anyone?
> > >>> +#define smp_mb() __sync_synchronize()
> > >>> +#else
> > >>> +#define smp_mb() asm volatile("lock; addl $0,0(%%esp) " ::: "memory")
> > >>> +#endif
> > >>> +
> > >>> +#elif defined(__x86_64__)
> > >>> +
> > >>> +#define smp_wmb() barrier()
> > >>> +#define smp_mb() asm volatile("mfence" ::: "memory")
> > >>>
> > >>> #elif defined(_ARCH_PPC)
> > >>>
> > >>> /*
> > >>> - * We use an eieio() for a wmb() on powerpc. This assumes we don't
> > >>> + * We use an eieio() for wmb() and mb() on powerpc. This assumes we
> > >>> don't
> > >>> * need to order cacheable and non-cacheable stores with respect to
> > >>> * each other
> > >>> */
> > >>> #define smp_wmb() asm volatile("eieio" ::: "memory")
> > >>> +#define smp_mb() asm volatile("eieio" ::: "memory")
> > >>
> > >> smp_mb() is hwsync under PPC,
> > >
> > > This one?
> > > __asm__ __volatile__ ("sync" : : : "memory")
> > >
> > >> but I would just trust GCC.
> > >>
> > >> Paolo
> > >
> > > __sync_synchronize? Unfortunately it's still pretty new.
> >
> > So is KVM on PPC. :)
> >
> > Paolo
> >
>
> --
> mailto:address@hidden
[Qemu-devel] [PATCHv2 2/3] virtio: add missing mb() on enable notification, Michael S. Tsirkin, 2012/04/23
[Qemu-devel] [PATCHv2 3/3] virtio: order index/descriptor reads, Michael S. Tsirkin, 2012/04/23