Re: [Qemu-devel] Bug in virtio_net

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Bug in virtio_net_load

From:	Jason Wang
Subject:	Re: [Qemu-devel] Bug in virtio_net_load
Date:	Fri, 1 Jul 2016 10:35:58 +0800
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0



On 2016年07月01日 01:23, Michael S. Tsirkin wrote:

On Thu, Jun 30, 2016 at 10:34:51AM +0200, Robin Geuze wrote:

Hey,

I work for TransIP and we host a VPS platform based on QEMU/KVM. We are
currently running qemu 2.4.0. A few days ago we noticed that live migrations
for some of our VM's would fail. Further investigation turned out it was
specific to windows server 2012, caused by the fact that the standard virtio
driver from RedHat was replaced in windows updates by a driver called
"Midfin eFabric" (this driver doesn't really seem to be meant for virtio, we
have a case running at MicroSoft about that).  Once we knew how to reproduce
we tested this on  QEMU 2.6.0 as well and it also seems to be affected
(later we found out that 2.4.0 to 2.6.0 migration does work probably due to
pure luck).

We started investigating the problem in QEMU 2.4.0 and noticed it was caused
by the fact that virtio_net_device_load requires certain feature flags to be
set, specifically to load curr_guest_offloads which is only written and read
if the VIRTIO_NET_F_CTRL_GUEST_OFFLOADS flag is set, but those flags are set
in virtio_load after the call to virtio_net_device_load. Moving the code
setting the feature flags before the call to virtio_net_device_load fixes
it, however it introduces another problem. Virtio can have 64-bits feature
flags, however the standard save payload for virtio only has space for
32-bits feature flags. This was solved by putting those in a subsection of
the vmstate_save_state stuff. Unfortunately this is called (and thus binary
offset located) after the virtio_net_device_load code.

There was an attempt to fix this in QEMU 2.6.0. However, this seems to have
broken it worse. The write code (virtio_net_save, virtio_save and
virtio_net_save_device) still puts the curr_guest_offloads value before the
vmstate_save_state data. However the read code expects and tries to read it
after the vmstate_save_state data. Should we just also change the
virtio_net_save code to have it follow the same order as virtio_net_load? Or
will this potentially break more stuff.

Regards,

Robin Geuze

TransIP BV

After going over it several times, I think the change in 2.6
was wrong



commit 1f8828ef573c83365b4a87a776daf8bcef1caa21
Author: Jason Wang <address@hidden>
Date:   Fri Sep 11 16:01:56 2015 +0800

     virtio-net: unbreak self announcement and guest offloads after migration

After commit 019a3edbb25f1571e876f8af1ce4c55412939e5d ("virtio: make

     features 64bit wide"). Device's guest_features was actually set after
     vdc->load(). This breaks the assumption that device specific load()
     function can check guest_features. For virtio-net, self announcement
     and guest offloads won't work after migration.

Fixing this by defer them to virtio_net_load() where guest_features

     were guaranteed to be set. Other virtio devices looks fine.

Fixes: 019a3edbb25f1571e876f8af1ce4c55412939e5d

            ("virtio: make features 64bit wide")
     Cc: address@hidden
     Cc: Gerd Hoffmann <address@hidden>
     Signed-off-by: Jason Wang <address@hidden>
     Reviewed-by: Michael S. Tsirkin <address@hidden>
     Signed-off-by: Michael S. Tsirkin <address@hidden>
     Reviewed-by: Cornelia Huck <address@hidden>


I'm not sure what was I thinking when I applied this:
it changes load without changing save - how can this work?


I am inclined to revert 1f8828ef573c83365b4a87a776daf8bcef1caa21 and
apply this instead:

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 7ed06ea..18153d5 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1499,6 +1499,16 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int 
version_id)
      }
      qemu_get_be32s(f, &features);

+ /*

+     * Temporarily set guest_features low bits - needed by
+     * virtio net load code testing for VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
+     * VIRTIO_NET_F_GUEST_ANNOUNCE and VIRTIO_NET_F_CTRL_VQ.
+     *
+     * Note: devices should always test host features in future - don't create
+     * new dependencies like this.
+     */
+    vdev->guest_features = features;
+
      config_len = qemu_get_be32(f);

/*


Could you please confirm whether this help?
Jason, Cornelia - any comments?

Yes, my patch was wrong and won't work if there's any subsections. Iagree to revert and apply yours.


Thanks


David, if this goes in I'm afraid your patchset reworking
save/load will have to be rebased, but I think we want
the bugfix first and new features/changes second.
Do you agree?

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] Bug in virtio_net_load, Robin Geuze, 2016/06/30
- Re: [Qemu-devel] Bug in virtio_net_load, Michael S. Tsirkin, 2016/06/30
  - Re: [Qemu-devel] Bug in virtio_net_load, Dr. David Alan Gilbert, 2016/06/30
  - Re: [Qemu-devel] Bug in virtio_net_load, Jason Wang <=

Prev by Date: Re: [Qemu-devel] [PATCH] target-i386: Show host and VM TSC frequencies on mismatch
Next by Date: Re: [Qemu-devel] [PATCH] pci_register_bar: cleanup
Previous by thread: Re: [Qemu-devel] Bug in virtio_net_load
Next by thread: [Qemu-devel] [PATCH] nec-usb-xhci: set the device state to USB_STATE_DEFAULT
Index(es):
- Date
- Thread