qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64


From: Michael S. Tsirkin
Subject: Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
Date: Wed, 25 Dec 2013 11:52:43 +0200

On Wed, Dec 25, 2013 at 12:36:12PM +1100, Alexey Kardashevskiy wrote:
> On 12/25/2013 02:43 AM, Michael S. Tsirkin wrote:
> > On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote:
> >> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote:
> >>> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy wrote:
> >>>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
> >>>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy wrote:
> >>>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
> >>>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
> >>>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey Kardashevskiy wrote:
> >>>>>>>>> Hi!
> >>>>>>>>>
> >>>>>>>>> I am having a problem with virtio-net + vhost on POWER7 machine - 
> >>>>>>>>> it does
> >>>>>>>>> not survive reboot of the guest.
> >>>>>>>>>
> >>>>>>>>> Steps to reproduce:
> >>>>>>>>> 1. boot the guest
> >>>>>>>>> 2. configure eth0 and do ping - everything works
> >>>>>>>>> 3. reboot the guest (i.e. type "reboot")
> >>>>>>>>> 4. when it is booted, eth0 can be configured but will not work at 
> >>>>>>>>> all.
> >>>>>>>>>
> >>>>>>>>> The test is:
> >>>>>>>>> ifconfig eth0 172.20.1.2 up
> >>>>>>>>> ping 172.20.1.23
> >>>>>>>>>
> >>>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it shows no 
> >>>>>>>>> trafic
> >>>>>>>>> coming from the guest. If to compare how it works before and after 
> >>>>>>>>> reboot,
> >>>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and 
> >>>>>>>>> receives the
> >>>>>>>>> response and it does the same after reboot but the answer does not 
> >>>>>>>>> come.
> >>>>>>>>
> >>>>>>>> So you see the arp packet in guest but not in host?
> >>>>>>>
> >>>>>>> Yes.
> >>>>>>>
> >>>>>>>
> >>>>>>>> One thing to try is to boot debug kernel - where pr_debug is
> >>>>>>>> enabled - then you might see some errors in the kernel log.
> >>>>>>>
> >>>>>>> Tried and added lot more debug printk myself, not clear at all what is
> >>>>>>> happening there.
> >>>>>>>
> >>>>>>> One more hint - if I boot the guest and the guest does not bring eth0 
> >>>>>>> up
> >>>>>>> AND wait more than 200 seconds (and less than 210 seconds), then eth0 
> >>>>>>> will
> >>>>>>> not work at all. I.e. this script produces not-working-eth0:
> >>>>>>>
> >>>>>>>
> >>>>>>> ifconfig eth0 172.20.1.2 down
> >>>>>>> sleep 210
> >>>>>>> ifconfig eth0 172.20.1.2 up
> >>>>>>> ping 172.20.1.23
> >>>>>>>
> >>>>>>> s/210/200/ - and it starts working. No reboot is required to 
> >>>>>>> reproduce.
> >>>>>>>
> >>>>>>> No "vhost" == always works. The only difference I can see here is 
> >>>>>>> vhost's
> >>>>>>> thread which may get suspended if not used for a while after the 
> >>>>>>> start and
> >>>>>>> does not wake up but this is almost a blind guess.
> >>>>>>
> >>>>>>
> >>>>>> Yet another clue - this host kernel patch seems to help with the guest
> >>>>>> reboot but does not help with the initial 210 seconds delay:
> >>>>>>
> >>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> >>>>>> index 69068e0..5e67650 100644
> >>>>>> --- a/drivers/vhost/vhost.c
> >>>>>> +++ b/drivers/vhost/vhost.c
> >>>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev *dev, 
> >>>>>> struct
> >>>>>> vhost_work *work)
> >>>>>>                 list_add_tail(&work->node, &dev->work_list);
> >>>>>>                 work->queue_seq++;
> >>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
> >>>>>> -               wake_up_process(dev->worker);
> >>>>>>         } else {
> >>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
> >>>>>>         }
> >>>>>> +       wake_up_process(dev->worker);
> >>>>>>  }
> >>>>>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> Interesting. Some kind of race? A missing memory barrier somewhere?
> >>>>
> >>>> I do not see how. I boot the guest and just wait 210 seconds, nothing
> >>>> happens to cause races.
> >>>>
> >>>>
> >>>>> Since it's all around startup,
> >>>>> you can try kicking the host eventfd in
> >>>>> vhost_net_start.
> >>>>
> >>>>
> >>>> How exactly? This did not help. Thanks.
> >>>>
> >>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> >>>> index 006576d..407ecf2 100644
> >>>> --- a/hw/net/vhost_net.c
> >>>> +++ b/hw/net/vhost_net.c
> >>>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, 
> >>>> NetClientState
> >>>> *ncs,
> >>>>          if (r < 0) {
> >>>>              goto err;
> >>>>          }
> >>>> +
> >>>> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
> >>>> +        struct vhost_vring_file file = {
> >>>> +            .index = i
> >>>> +        };
> >>>> +        file.fd =
> >>>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
> >>>> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
> >>>
> >>> No, this sets the notifier, it does not kick.
> >>> To kick you write 1 there:
> >>>   uint6_t  v = 1;
> >>>   write(fd, &v, sizeof v);
> >>
> >>
> >> Please, be precise. How/where do I get that @fd? Is what I do correct?
> > 
> > Yes.
> > 
> >> What
> >> is uint6_t - uint8_t or uint16_t (neither works)?
> > 
> > Sorry, should have been uint64_t.
> 
> 
> Oh, that I missed :-) Anyway, this does not make any difference. Is there
> any cheap&dirty way to make vhost-net kernel thread always awake? Sending
> it signals from the user space does not work...

You can run a timer in qemu and signal the eventfd from there
periodically.

Just to restate, tcpdump in guest shows that guest sends arp packet,
but tcpdump in host on tun device does not show any packets?

If yes, other things to try:
1. trace handle_tx [vhost_net]
2. trace tun_get_user [tun]
3. I suspect some guest bug in one of the features.
Let's try to disable some flags with device property:
you can get the list by doing:
./x86_64-softmmu/qemu-system-x86_64 -device virtio-net-pci,?|grep on/off
Things I would try turning off is host offloads (ones that start with host_)
event_idx,any_layout,mq.
Turn them all off, if it helps try to find the one that helped.


> 
> 
> >> May be it is a missing barrier - I rebooted machine several times and now
> >> sometime after even 240 seconds (not 210 as before) it works (but most of
> >> the time still does not)...
> >>
> >>
> >>>> +        if (r) {
> >>>> +            error_report("Error notifiyng host notifier: %d", -r);
> >>>> +            goto err;
> >>>> +        }
> >>>>      }
> >>>>
> >>>>
> >>>>
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>>>> If to remove vhost=on, it is all good. If to try Fedora19
> >>>>>>>>> (v3.10-something), it all good again - works before and after 
> >>>>>>>>> reboot.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> And there 2 questions:
> >>>>>>>>>
> >>>>>>>>> 1. does anybody have any clue what might go wrong after reboot?
> >>>>>>>>>
> >>>>>>>>> 2. Is there any good material to read about what exactly and how 
> >>>>>>>>> vhost
> >>>>>>>>> accelerates?
> >>>>>>>>>
> >>>>>>>>> My understanding is that packets from the guest to the real network 
> >>>>>>>>> are
> >>>>>>>>> going as:
> >>>>>>>>> 1. guest's virtio-pci-net does ioport(VIRTIO_PCI_QUEUE_NOTIFY)
> >>>>>>>>> 2. QEMU's net/virtio-net.c calls qemu_net_queue_deliver()
> >>>>>>>>> 3. QEMU's net/tap.c calls tap_write_packet() and this is how the 
> >>>>>>>>> host knows
> >>>>>>>>> that there is a new packet.
> >>>>>>>
> >>>>>>>
> >>>>>>> What about the documentation? :) or the idea?
> >>>>>>>
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thanks!
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> This how I run QEMU:
> >>>>>>>>> ./qemu-system-ppc64 \
> >>>>>>>>> -enable-kvm \
> >>>>>>>>> -m 2048 \
> >>>>>>>>> -machine pseries \
> >>>>>>>>> -initrd 1.cpio \
> >>>>>>>>> -kernel vml312_virtio_net_dbg \
> >>>>>>>>> -nographic \
> >>>>>>>>> -vga none \
> >>>>>>>>> -netdev
> >>>>>>>>> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on
> >>>>>>>>>  \
> >>>>>>>>> -device virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> That is bridge config:
> >>>>>>>>> address@hidden ~]$ brctl show
> >>>>>>>>> bridge name bridge id               STP enabled     interfaces
> >>>>>>>>> brtest              8000.00145e992e88       no      pin     eth4
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> The ifup.sh script:
> >>>>>>>>> ifconfig $1 hw ether ee:01:02:03:04:05
> >>>>>>>>> /sbin/ifconfig $1 up
> >>>>>>>>> /usr/sbin/brctl addif brtest $1
> 
> 
> 
> -- 
> Alexey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]