qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 答复: [question]vhost-user: atuo fix network link broken during migrat


From: Jason Wang
Subject: Re: 答复: [question]vhost-user: atuo fix network link broken during migration
Date: Thu, 26 Mar 2020 17:45:50 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0


On 2020/3/24 下午7:08, yangke (J) wrote:
We find an issue when host mce trigger openvswitch(dpdk) restart in
source host during guest migration,

Did you mean the vhost-user netev was deleted from the source host?

The vhost-user netev was not deleted from the source host. I mean that:
in normal scenario, OVS(DPDK) begin to restart, then qemu_chr disconnect to OVS 
and link status is set to link down; OVS(DPDK) started, then qemu_chr reconnect to 
OVS and link status is set to link up. But in our scenario, before qemu_chr 
reconnect to OVS, the VM migrate is finished. The link_down of frontend was loaded 
from n->status in destination, it cause the network in gust never be up again.


I'm not sure we should fix this in qemu.

Generally, it's the task of management to make sure the destination device configuration is the same as source.

E.g in this case, management should bring up the link if re-connection in source is completed.

What's more the qmp_set_link() done in vhost-user.c looks hacky which changes the link status without the care of management.



qemu_chr disconnect:
#0  vhost_user_write (msg=msg@entry=0x7fff59ecb2b0, fds=fds@entry=0x0, 
fd_num=fd_num@entry=0, dev=0x295c730, dev=0x295c730)
     at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost_user.c:239
#1  0x00000000004e6bad in vhost_user_get_vring_base (dev=0x295c730, 
ring=0x7fff59ecb510)
     at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost_user.c:497
#2  0x00000000004e2e88 in vhost_virtqueue_stop (dev=dev@entry=0x295c730, 
vdev=vdev@entry=0x2ca36c0, vq=0x295c898, idx=0)
     at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost.c:1036
#3  0x00000000004e45ab in vhost_dev_stop (hdev=hdev@entry=0x295c730, 
vdev=vdev@entry=0x2ca36c0)
     at /usr/src/debug/qemu-kvm-2.8.1/hw/virtio/vhost.c:1556
#4  0x00000000004bc56a in vhost_net_stop_one (net=0x295c730, 
dev=dev@entry=0x2ca36c0)
     at /usr/src/debug/qemu-kvm-2.8.1/hw/net/vhost_net.c:326
#5  0x00000000004bcc3b in vhost_net_stop (dev=dev@entry=0x2ca36c0, ncs=<optimized 
out>,   total_queues=4)
     at /usr/src/debug/qemu-kvm-2.8.1/hw/net/vhost_net.c:407
#6  0x00000000004b85f6 in virtio_net_vhost_status (n=n@entry=0x2ca36c0, 
status=status@entry=7 '\a')
     at /usr/src/debug/qemu-kvm-2.8.1/hw/net/virtio_net.c:177
#7  0x00000000004b869f in virtio_net_set_status (vdev=<optimized out>, 
status=<optimized out>)
     at /usr/src/debug/qemu-kvm-2.8.1/hw/net/virtio_net.c:243
#8  0x000000000073d00d in qmp_set_link (name=name@entry=0x2956d40 "hostnet0", 
up=up@entry=false, errp=errp@entry=0x7fff59ecd718)
     at net/net.c:1437
#9  0x00000000007460c1 in net_vhost_user_event (opaque=0x2956d40, event=4) at 
net/vhost_user.c:217//qemu_chr_be_event
#10 0x0000000000574f0d in tcp_chr_disconnect (chr=0x2951a40) at qemu_char.c:3220
#11 0x000000000057511f in tcp_chr_hup (channel=<optimized out>,   cond=<optimized 
out>, opaque=<optimized out>) at qemu_char.c:3265



VM is still link down in frontend after migration, it cause the network in VM 
never be up again.

virtio_net_load_device:
      /* nc.link_down can't be migrated, so infer link_down according
       * to link status bit in n->status */
      link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
      for (i = 0; i < n->max_queues; i++) {
          qemu_get_subqueue(n->nic, i)->link_down = link_down;
      }

guset:               migrate begin -----> vCPU pause ---> vmsate load ---> 
migrate finish
                                      ^                ^                ^
                                      |                |                |
openvswitch in source host:   begin to restart   restarting        started
                                      ^                ^                ^
                                      |                |                |
nc in frontend in source:        link down        link down        link down
                                      ^                ^                ^
                                      |                |                |
nc in frontend in destination:   link up          link up          link down
                                      ^                ^                ^
                                      |                |                |
guset network:                    broken           broken           broken
                                      ^                ^                ^
                                      |                |                |
nc in backend in source:         link down        link down        link up
                                      ^                ^                ^
                                      |                |                |
nc in backend in destination:    link up          link up          link up

The link_down of frontend was loaded from n->status, n->status is link
down in source, so the link_down of frontend is true. The backend in
destination host is link up, but the frontend in destination host is link down, 
it cause the network in gust never be up again until an guest cold reboot.

Is there a way to auto fix the link status? or just abort the migration in 
virtio net device load?

Maybe we can try to sync link status after migration?

Thanks

In extreme scenario, after migration the OVS(DPDK) in source may be still not 
started.


Our plan is to check the link state of backend when load the link_down of 
frontend.
      /* nc.link_down can't be migrated, so infer link_down according
       * to link status bit in n->status */
-    link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
+    if (qemu_get_queue(n->nic)->peer->info->type == 
NET_CLIENT_DRIVER_VHOST_USER) {
+        link_down = (n->status & VIRTIO_NET_S_LINK_UP | 
!qemu_get_queue(n->nic)->peer->link_down) == 0;
+    } else {
+        link_down = (n->status & VIRTIO_NET_S_LINK_UP) == 0;
+    }
      for (i = 0; i < n->max_queues; i++) {
          qemu_get_subqueue(n->nic, i)->link_down = link_down;
      }

Is good enough to auto fix the link status?


I still think it's the task of management. Try sync status internally as what vhost-user currently did may lead bugs.

Thanks



Thanks




reply via email to

[Prev in Thread] Current Thread [Next in Thread]