qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PCI memory sync question (kvm,dpdk,e1000,packet stalled)


From: Stefan Hajnoczi
Subject: Re: PCI memory sync question (kvm,dpdk,e1000,packet stalled)
Date: Thu, 2 Jan 2020 11:05:04 +0000

On Mon, Dec 30, 2019 at 01:10:15PM +0300, ASM wrote:
> > It could be a bug in QEMU's e1000 emulation - maybe it's not doing
> > things in the correct order and causes a race condition with the DPDK
> > polling driver - or it could be a bug in the DPDK e1000 driver regarding
> > the order in which the descriptor ring and RX Head/Tail MMIO registers
> > are updated.
> 
> 
> What did I understand:
> * DPDK and Kernel drivers work like simular with ring. It don't
> analize Head, but check STATUS.
> This is a bit strange but completely correct driver behavior. If the
> driver writes to memory, it expects
> this value to be written. The problem is definitely not in the DPDK
> and in the kernel driver.
> * This problem appears on KVM, but not appears on tcg.
> * Most similar to a bug in QEMU e1000 emulation. The e1000 emulation
> read and writes to some
> memory and same times, same as dpdk driver.
> 
> 
> As I understand it, KVM explicitly prohibits access to shared memory.
> It is obvious that us need to
> protect (RCU) all STATUS registers of all buffers. There can be a lot
> of buffers and they can be
> scattered throughout the memory.

I don't agree with this reasoning because these descriptor rings are
designed to support simultaneous access by the driver and the device (if
done with care!).  What QEMU and the driver are attempting is typical.

The important thing for safe shared memory access is that both the
driver and the device know who is allowed to write to a memory location
at a point in time.  As long as both the driver and device don't write
to the same location it is possible to safely use the data structure.

The typical pattern that a driver or device uses to publish information
is:

  1. Fill in all fields but don't set the STATUS bit yet.
  2. Memory barrier or other memory ordering directive to ensure that
     fields have been written.  This operation might be implicit.
  3. Set the STATUS bit in a separate operation.

On the read side there should be:

  1. Load the STATUS field and check the bit.  If it's clear then try
     again.
  2. Memory barrier or other memory ordering directive to ensure that
     fields have been written.  This operation might be implicit.
  3. Load all the fields.

Can you explain why the STATUS fields need to be protected?  My
understanding is that the STATUS field is owned by the device at this
point in time.  The device writes to the STATUS field and the driver
polls it - this is safe.

I think the issue is bugs in the code.  The DPDK code looks questionable
(https://git.dpdk.org/dpdk/tree/drivers/net/e1000/em_rxtx.c#n715):

  volatile struct e1000_rx_desc *rxdp;
  ...
  rxdp = &rx_ring[rx_id];
  status = rxdp->status;
  if (! (status & E1000_RXD_STAT_DD))
      break;
  rxd = *rxdp;

Although there is a conditional branch on rxdp->status, the CPU may load
*rxdp before loading rxdp->status.  These pointers are volatile but
that's not enough to prevent the CPU from reordering memory accesses
(the compiler emits regular load instructions without memory ordering
directives).

This is probably not the bug that you're seeing, but it suggests there
are more problems.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]