[Qemu-devel] Re: Network shutdown under load

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] Re: Network shutdown under load

From:	Anthony Liguori
Subject:	[Qemu-devel] Re: Network shutdown under load
Date:	Mon, 08 Feb 2010 14:58:05 -0600
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.5) Gecko/20091209 Fedora/3.0-4.fc12 Lightning/1.0pre Thunderbird/3.0

On 02/08/2010 10:10 AM, Tom Lendacky wrote:

Fix a race condition where qemu finds that there are not enough virtio
ring buffers available and the guest make more buffers available before
qemu can enable notifications.

Signed-off-by: Tom Lendacky<address@hidden>
Signed-off-by: Anthony Liguori<address@hidden>

I've walked through the changes in this series and I'm pretty certainthat this is the only problem. I'd appreciate if others could reviewthough.


Regards,

Anthony Liguori

  hw/virtio-net.c |   10 +++++++++-
  1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 6e48997..5c0093e 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -379,7 +379,15 @@ static int virtio_net_has_buffers(VirtIONet *n, int 
bufsize)
          (n->mergeable_rx_bufs&&
           !virtqueue_avail_bytes(n->rx_vq, bufsize, 0))) {
          virtio_queue_set_notification(n->rx_vq, 1);
-        return 0;
+
+        /* To avoid a race condition where the guest has made some buffers
+         * available after the above check but before notification was
+         * enabled, check for available buffers again.
+         */
+        if (virtio_queue_empty(n->rx_vq) ||
+            (n->mergeable_rx_bufs&&
+             !virtqueue_avail_bytes(n->rx_vq, bufsize, 0)))
+            return 0;
      }

      virtio_queue_set_notification(n->rx_vq, 0);

On Friday 29 January 2010 02:06:41 pm Tom Lendacky wrote:

There's been some discussion of this already in the kvm list, but I want to
summarize what I've found and also include the qemu-devel list in an effort
  to find a solution to this problem.

Running a netperf test between two kvm guests results in the guest's
  network interface shutting down. I originally found this using kvm guests
  on two different machines that were connected via a 10GbE link.  However,
  I found this problem can be easily reproduced using two guests on the same
  machine.

I am running the 2.6.32 level of the kvm.git tree and the 0.12.1.2 level of
the qemu-kvm.git tree.

The setup includes two bridges, br0 and br1.

The commands used to start the guests are as follows:
usr/local/bin/qemu-system-x86_64 -name cape-vm001 -m 1024 -drive
file=/autobench/var/tmp/cape-vm001-
raw.img,if=virtio,index=0,media=disk,boot=on -net
nic,model=virtio,vlan=0,macaddr=00:16:3E:00:62:51,netdev=cape-vm001-eth0 -
netdev tap,id=cape-vm001-eth0,script=/autobench/var/tmp/ifup-kvm-
br0,downscript=/autobench/var/tmp/ifdown-kvm-br0 -net
nic,model=virtio,vlan=1,macaddr=00:16:3E:00:62:D1,netdev=cape-vm001-eth1 -
netdev tap,id=cape-vm001-eth1,script=/autobench/var/tmp/ifup-kvm-
br1,downscript=/autobench/var/tmp/ifdown-kvm-br1 -vnc :1 -monitor
telnet::5701,server,nowait -snapshot -daemonize

usr/local/bin/qemu-system-x86_64 -name cape-vm002 -m 1024 -drive
file=/autobench/var/tmp/cape-vm002-
raw.img,if=virtio,index=0,media=disk,boot=on -net
nic,model=virtio,vlan=0,macaddr=00:16:3E:00:62:61,netdev=cape-vm002-eth0 -
netdev tap,id=cape-vm002-eth0,script=/autobench/var/tmp/ifup-kvm-
br0,downscript=/autobench/var/tmp/ifdown-kvm-br0 -net
nic,model=virtio,vlan=1,macaddr=00:16:3E:00:62:E1,netdev=cape-vm002-eth1 -
netdev tap,id=cape-vm002-eth1,script=/autobench/var/tmp/ifup-kvm-
br1,downscript=/autobench/var/tmp/ifdown-kvm-br1 -vnc :2 -monitor
telnet::5702,server,nowait -snapshot -daemonize

The ifup-kvm-br0 script takes the (first) qemu created tap device and
  brings it up and adds it to bridge br0.  The ifup-kvm-br1 script take the
  (second) qemu created tap device and brings it up and adds it to bridge
  br1.

Each ethernet device within a guest is on it's own subnet.  For example:
   guest 1 eth0 has addr 192.168.100.32 and eth1 has addr 192.168.101.32
   guest 2 eth0 has addr 192.168.100.64 and eth1 has addr 192.168.101.64

On one of the guests run netserver:
   netserver -L 192.168.101.32 -p 12000

On the other guest run netperf:
   netperf -L 192.168.101.64 -H 192.168.101.32 -p 12000 -t TCP_STREAM -l 60
  -c -C -- -m 16K -M 16K

It may take more than one netperf run (I find that my second run almost
  always causes the shutdown) but the network on the eth1 links will stop
  working.

I did some debugging and found that in qemu on the guest running netserver:
  - the receive_disabled variable is set and never gets reset
  - the read_poll event handler for the eth1 tap device is disabled and
  never re-enabled
These conditions result in no packets being read from the tap device and
  sent to the guest - effectively shutting down the network.  Network
  connectivity can be restored by shutting down the guest interfaces,
  unloading the virtio_net module, re-loading the virtio_net module and
  re-starting the guest interfaces.

I'm continuing to work on debugging this, but would appreciate if some
  folks with more qemu network experience could try to recreate and debug
  this.

If my kernel config matters, I can provide that.

Thanks,
Tom
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to address@hidden
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] Re: Network shutdown under load, RW, 2010/02/03
- [Qemu-devel] Re: Network shutdown under load, Tom Lendacky, 2010/02/08
  - [Qemu-devel] Re: Network shutdown under load, Anthony Liguori <=
  - [Qemu-devel] Re: Network shutdown under load, Herbert Xu, 2010/02/08
  - [Qemu-devel] Re: Network shutdown under load, RW, 2010/02/09
  - [Qemu-devel] Re: Network shutdown under load, Anthony Liguori, 2010/02/10

Prev by Date: Re: [Qemu-devel] Re: [PATCH] pci: initialize header type register.
Next by Date: Re: [Qemu-devel] 0.12.2, PowerPC, CPU 750 wrongly identified (?), hardware error
Previous by thread: [Qemu-devel] Re: Network shutdown under load
Next by thread: [Qemu-devel] Re: Network shutdown under load
Index(es):
- Date
- Thread