qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64


From: Alexey Kardashevskiy
Subject: Re: [Qemu-devel] vhost-net issue: does not survive reboot on ppc64
Date: Mon, 06 Jan 2014 20:57:00 +1100
User-agent: Mozilla/5.0 (X11; Linux i686 on x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0

On 12/27/2013 12:44 PM, Alexey Kardashevskiy wrote:
> On 12/27/2013 02:12 AM, Michael S. Tsirkin wrote:
>> On Fri, Dec 27, 2013 at 01:59:19AM +1100, Alexey Kardashevskiy wrote:
>>> On 12/27/2013 12:48 AM, Michael S. Tsirkin wrote:
>>>> On Thu, Dec 26, 2013 at 11:51:04PM +1100, Alexey Kardashevskiy wrote:
>>>>> On 12/26/2013 09:49 PM, Michael S. Tsirkin wrote:
>>>>>> On Thu, Dec 26, 2013 at 09:13:31PM +1100, Alexey Kardashevskiy wrote:
>>>>>>> On 12/25/2013 08:52 PM, Michael S. Tsirkin wrote:
>>>>>>>> On Wed, Dec 25, 2013 at 12:36:12PM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>> On 12/25/2013 02:43 AM, Michael S. Tsirkin wrote:
>>>>>>>>>> On Wed, Dec 25, 2013 at 01:15:29AM +1100, Alexey Kardashevskiy wrote:
>>>>>>>>>>> On 12/24/2013 08:40 PM, Michael S. Tsirkin wrote:
>>>>>>>>>>>> On Tue, Dec 24, 2013 at 02:09:07PM +1100, Alexey Kardashevskiy 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> On 12/24/2013 03:24 AM, Michael S. Tsirkin wrote:
>>>>>>>>>>>>>> On Mon, Dec 23, 2013 at 02:01:13AM +1100, Alexey Kardashevskiy 
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> On 12/23/2013 01:46 AM, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>>>>> On 12/22/2013 09:56 PM, Michael S. Tsirkin wrote:
>>>>>>>>>>>>>>>>> On Sun, Dec 22, 2013 at 02:01:23AM +1100, Alexey 
>>>>>>>>>>>>>>>>> Kardashevskiy wrote:
>>>>>>>>>>>>>>>>>> Hi!
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I am having a problem with virtio-net + vhost on POWER7 
>>>>>>>>>>>>>>>>>> machine - it does
>>>>>>>>>>>>>>>>>> not survive reboot of the guest.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Steps to reproduce:
>>>>>>>>>>>>>>>>>> 1. boot the guest
>>>>>>>>>>>>>>>>>> 2. configure eth0 and do ping - everything works
>>>>>>>>>>>>>>>>>> 3. reboot the guest (i.e. type "reboot")
>>>>>>>>>>>>>>>>>> 4. when it is booted, eth0 can be configured but will not 
>>>>>>>>>>>>>>>>>> work at all.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The test is:
>>>>>>>>>>>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>>>>>>>>>>>> ping 172.20.1.23
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If to run tcpdump on the host's "tap-id3" interface, it 
>>>>>>>>>>>>>>>>>> shows no trafic
>>>>>>>>>>>>>>>>>> coming from the guest. If to compare how it works before and 
>>>>>>>>>>>>>>>>>> after reboot,
>>>>>>>>>>>>>>>>>> I can see the guest doing an ARP request for 172.20.1.23 and 
>>>>>>>>>>>>>>>>>> receives the
>>>>>>>>>>>>>>>>>> response and it does the same after reboot but the answer 
>>>>>>>>>>>>>>>>>> does not come.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So you see the arp packet in guest but not in host?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> One thing to try is to boot debug kernel - where pr_debug is
>>>>>>>>>>>>>>>>> enabled - then you might see some errors in the kernel log.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Tried and added lot more debug printk myself, not clear at all 
>>>>>>>>>>>>>>>> what is
>>>>>>>>>>>>>>>> happening there.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> One more hint - if I boot the guest and the guest does not 
>>>>>>>>>>>>>>>> bring eth0 up
>>>>>>>>>>>>>>>> AND wait more than 200 seconds (and less than 210 seconds), 
>>>>>>>>>>>>>>>> then eth0 will
>>>>>>>>>>>>>>>> not work at all. I.e. this script produces not-working-eth0:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ifconfig eth0 172.20.1.2 down
>>>>>>>>>>>>>>>> sleep 210
>>>>>>>>>>>>>>>> ifconfig eth0 172.20.1.2 up
>>>>>>>>>>>>>>>> ping 172.20.1.23
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> s/210/200/ - and it starts working. No reboot is required to 
>>>>>>>>>>>>>>>> reproduce.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> No "vhost" == always works. The only difference I can see here 
>>>>>>>>>>>>>>>> is vhost's
>>>>>>>>>>>>>>>> thread which may get suspended if not used for a while after 
>>>>>>>>>>>>>>>> the start and
>>>>>>>>>>>>>>>> does not wake up but this is almost a blind guess.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yet another clue - this host kernel patch seems to help with 
>>>>>>>>>>>>>>> the guest
>>>>>>>>>>>>>>> reboot but does not help with the initial 210 seconds delay:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>>>>>>>>>>>>>>> index 69068e0..5e67650 100644
>>>>>>>>>>>>>>> --- a/drivers/vhost/vhost.c
>>>>>>>>>>>>>>> +++ b/drivers/vhost/vhost.c
>>>>>>>>>>>>>>> @@ -162,10 +162,10 @@ void vhost_work_queue(struct vhost_dev 
>>>>>>>>>>>>>>> *dev, struct
>>>>>>>>>>>>>>> vhost_work *work)
>>>>>>>>>>>>>>>                 list_add_tail(&work->node, &dev->work_list);
>>>>>>>>>>>>>>>                 work->queue_seq++;
>>>>>>>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>>>>>>>>>> -               wake_up_process(dev->worker);
>>>>>>>>>>>>>>>         } else {
>>>>>>>>>>>>>>>                 spin_unlock_irqrestore(&dev->work_lock, flags);
>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>> +       wake_up_process(dev->worker);
>>>>>>>>>>>>>>>  }
>>>>>>>>>>>>>>>  EXPORT_SYMBOL_GPL(vhost_work_queue);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Interesting. Some kind of race? A missing memory barrier 
>>>>>>>>>>>>>> somewhere?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I do not see how. I boot the guest and just wait 210 seconds, 
>>>>>>>>>>>>> nothing
>>>>>>>>>>>>> happens to cause races.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Since it's all around startup,
>>>>>>>>>>>>>> you can try kicking the host eventfd in
>>>>>>>>>>>>>> vhost_net_start.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> How exactly? This did not help. Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
>>>>>>>>>>>>> index 006576d..407ecf2 100644
>>>>>>>>>>>>> --- a/hw/net/vhost_net.c
>>>>>>>>>>>>> +++ b/hw/net/vhost_net.c
>>>>>>>>>>>>> @@ -229,6 +229,17 @@ int vhost_net_start(VirtIODevice *dev, 
>>>>>>>>>>>>> NetClientState
>>>>>>>>>>>>> *ncs,
>>>>>>>>>>>>>          if (r < 0) {
>>>>>>>>>>>>>              goto err;
>>>>>>>>>>>>>          }
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +        VHostNetState *vn = tap_get_vhost_net(ncs[i].peer);
>>>>>>>>>>>>> +        struct vhost_vring_file file = {
>>>>>>>>>>>>> +            .index = i
>>>>>>>>>>>>> +        };
>>>>>>>>>>>>> +        file.fd =
>>>>>>>>>>>>> event_notifier_get_fd(virtio_queue_get_host_notifier(dev->vq));
>>>>>>>>>>>>> +        r = ioctl(vn->dev.control, VHOST_SET_VRING_KICK, &file);
>>>>>>>>>>>>
>>>>>>>>>>>> No, this sets the notifier, it does not kick.
>>>>>>>>>>>> To kick you write 1 there:
>>>>>>>>>>>>    uint6_t  v = 1;
>>>>>>>>>>>>    write(fd, &v, sizeof v);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Please, be precise. How/where do I get that @fd? Is what I do 
>>>>>>>>>>> correct?
>>>>>>>>>>
>>>>>>>>>> Yes.
>>>>>>>>>>
>>>>>>>>>>> What
>>>>>>>>>>> is uint6_t - uint8_t or uint16_t (neither works)?
>>>>>>>>>>
>>>>>>>>>> Sorry, should have been uint64_t.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Oh, that I missed :-) Anyway, this does not make any difference. Is 
>>>>>>>>> there
>>>>>>>>> any cheap&dirty way to make vhost-net kernel thread always awake? 
>>>>>>>>> Sending
>>>>>>>>> it signals from the user space does not work...
>>>>>>>>
>>>>>>>> You can run a timer in qemu and signal the eventfd from there
>>>>>>>> periodically.
>>>>>>>>
>>>>>>>> Just to restate, tcpdump in guest shows that guest sends arp packet,
>>>>>>>> but tcpdump in host on tun device does not show any packets?
>>>>>>>
>>>>>>>
>>>>>>> Ok. Figured it out about disabling interfaces in Fedora19. I was wrong,
>>>>>>> something is happening on the host's TAP - the guest sends ARP request, 
>>>>>>> the
>>>>>>> response is visible on the TAP interface but not in the guest.
>>>>>>
>>>>>> Okay. So problem is on host to guest path then.
>>>>>> Things to try:
>>>>>>
>>>>>> 1. trace handle_rx [vhost_net]
>>>>>> 2. trace tun_put_user [tun]
>>>>>> 3. I suspect some host bug in one of the features.
>>>>>> Let's try to disable some flags with device property:
>>>>>> you can get the list by doing:
>>>>>> ./x86_64-softmmu/qemu-system-x86_64 -device virtio-net-pci,?|grep on/off
>>>>>> Things I would try turning off is guest offloads (ones that start with 
>>>>>> guest_)
>>>>>> event_idx,any_layout,mq.
>>>>>> Turn them all off, if it helps try to find the one that helped.
>>>>>
>>>>>
>>>>> Heh. It still would be awesome to read basics about this vhost thing as I
>>>>> am debugging blindly :)
>>>>>
>>>>> Regarding your suggestions.
>>>>>
>>>>> 1. I put "printk" in handle_rx and tun_put_user.
>>>>
>>>> Fine, though it's easier with ftrace  http://lwn.net/Articles/370423/
>>>> look for function filtering.
>>>>
>>>>> handle_rx stopped being called after 2:40 from the guest start,
>>>>> tun_put_user stopped after 0:20 from the guest start. Accuracy is 5 
>>>>> seconds.
>>>>> If I bring the guest's eth0 up while handle_rx is still printing, it 
>>>>> works,
>>>>> i.e. tun_put_user is called a lot. Once handle_rx stopped, nothing can
>>>>> bring eth0 back to live.
>>>>
>>>> OK so what should happen is that handle rx is called
>>>> when you bring eth0 up.
>>>> Do you see this?
>>>> The way it is supposed to work is this:
>>>>
>>>> vhost_net_enable_vq calls vhost_poll_start then
>>>
>>>
>>> This and what follows it is called when QEMU is just booting (in response
>>> to PCI enable? somewhere in the middle of PCI discovery process) and then
>>> VHOST_NET_SET_BACKEND is not called ever again.
>>>
>>
>> What should happen is up/down in guest
>> will call virtio_net_vhost_status in qemu
>> and then vhost_net_start/vhost_net_stop is called
>> accordingly.
>> These call VHOST_NET_SET_BACKEND ioctls
>>
>> you don't see this?
> 
> 
> Nope. What I see is that vhost_net_start is only called on
> VIRTIO_PCI_STATUS and never after that as PCI status does not change (does
> not it?).
> 
> The log of QEMU + gdb with some breakpoints:
> http://pastebin.com/CSN6iSn6
> 
> In this example, I did not wait ~240 seconds so it works but still does not
> print what you say it should print :)
> 
> Here is what I run:
> http://ozlabs.ru/gitweb/?p=qemu/.git;a=shortlog;h=refs/heads/vhostdbg
> 
> Thanks!
> 
> [ time to go to the ocean :) ]


I am back. Are you? :)

Looked a bit further. In the guest's virtnet_set_rx_mode()
(drivers/net/virtio_net.c) I added this:

===
struct scatterlist sg;
struct virtio_net_ctrl_mq s;

s.virtqueue_pairs = 1;
sg_init_one(&sg, &s, sizeof(s));
virtnet_send_command(vi, VIRTIO_NET_CTRL_MQ,
          VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET, &sg, NULL);
===

.. in a desperate hope that it will signal to QEMU to stop vhost in
virtio_net_vhost_status(). But it does not call vhost_net_stop() as the
link is up - it is always up since virtnet_probe() and it never goes down
so I guess this is by design.


So... vhost-net thread in the host goes to sleep and there is no way to
wake it up from the guest as "ifconfig eth0 down ; ifconfig eth0 up" does
not change neither link status or @VirtIODevice::status.


What would be the right thing to do now? Implement link state management?
Or invent "virtio link" and leave QEMU's nc->peer->link_down alone?

Or there is some way to tell the kernel thread not to sleep?

Thanks!



> 
> 
>>>
>>>> this calls mask = file->f_op->poll(file, &poll->table)
>>>> on the tun file.
>>>> this calls tun_chr_poll.
>>>> at this point there are packets queued on tun already
>>>> so that returns POLLIN | POLLRDNORM;
>>>> this calls vhost_poll_wakeup and that checks mask against
>>>> the key.
>>>> key is POLLIN so vhost_poll_queue is called.
>>>> this in turn calls vhost_work_queue
>>>> work list is either empty then we wake up worker
>>>> or it's not empty  then worker is running out job anyway.
>>>> this will then invoke handle_rx_net.
>>>>
>>>>
>>>>> 2. This is exactly how I run QEMU now. I basically set "off" for every
>>>>> on/off parameters. This did not change anything.
>>>>>
>>>>> ./qemu-system-ppc64 \
>>>>>   -enable-kvm \
>>>>>   -m 2048 \
>>>>>   -L qemu-ppc64-bios/ \
>>>>>   -machine pseries \
>>>>>   -trace events=qemu_trace_events \
>>>>>   -kernel vml312 \
>>>>>   -append root=/dev/sda3 virtimg/fc19_16GB_vhostdbg.qcow2 \
>>>>>   -nographic \
>>>>>   -vga none \
>>>>>   -nodefaults \
>>>>>   -chardev stdio,id=id0,signal=off,mux=on \
>>>>>   -device spapr-vty,id=id1,chardev=id0,reg=0x71000100 \
>>>>>   -mon id=id2,chardev=id0,mode=readline \
>>>>>   -netdev
>>>>> tap,id=id3,ifname=tap-id3,script=ifup.sh,downscript=ifdown.sh,vhost=on \
>>>>>   -device
>>>>> virtio-net-pci,id=id4,netdev=id3,mac=C0:41:49:4b:00:00,tx=timer,ioeventfd=off,\
>>>>> indirect_desc=off,event_idx=off,any_layout=off,csum=off,guest_csum=off,\
>>>>> gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,\
>>>>> host_tso4=off,host_tso6=off,host_ecn=off,host_ufo=off,mrg_rxbuf=off,\
>>>>> status=off,ctrl_vq=off,ctrl_rx=off,ctrl_vlan=off,ctrl_rx_extra=off,\
>>>>> ctrl_mac_addr=off,ctrl_guest_offloads=off,mq=off,multifunction=off,\
>>>>> command_serr_enable=off \
>>>>>   -netdev user,id=id5,hostfwd=tcp::5000-:22 \
>>>>>   -device spapr-vlan,id=id6,netdev=id5,mac=C0:41:49:4b:00:01
>>>>>
>>>>
>>>> Yes this looks like some kind of race.
>>>
>>>
>>> -- 
>>> Alexey
> 
> 


-- 
Alexey



reply via email to

[Prev in Thread] Current Thread [Next in Thread]