[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC 0/2] Attempt to implement the standby feature for
Re: [Qemu-devel] [RFC 0/2] Attempt to implement the standby feature for assigned network devices
Fri, 7 Dec 2018 14:36:07 -0200
On Thu, Dec 06, 2018 at 10:06:18AM +0000, Daniel P. Berrangé wrote:
> On Wed, Dec 05, 2018 at 02:24:32PM -0600, Michael Roth wrote:
> > Quoting Daniel P. Berrangé (2018-12-05 11:18:18)
> > > On Thu, Oct 25, 2018 at 05:06:29PM +0300, Sameeh Jubran wrote:
> > > > From: Sameeh Jubran <address@hidden>
> > > >
> > > > Hi all,
> > > >
> > > > Background:
> > > >
> > > > There has been a few attempts to implement the standby feature for vfio
> > > > assigned devices which aims to enable the migration of such devices.
> > > > This
> > > > is another attempt.
> > > >
> > > > The series implements an infrastructure for hiding devices from the bus
> > > > upon boot. What it does is the following:
> > > >
> > > > * In the first patch the infrastructure for hiding the device is added
> > > > for the qbus and qdev APIs. A "hidden" boolean is added to the device
> > > > state and it is set based on a callback to the standby device which
> > > > registers itself for handling the assessment: "should the primary
> > > > device
> > > > be hidden?" by cross validating the ids of the devices.
> > > >
> > > > * In the second patch the virtio-net uses the API to hide the vfio
> > > > device and unhides it when the feature is acked.
> > >
> > > IIUC, the general idea is that we want to provide a pair of associated NIC
> > > devices to the guest, one emulated, one physical PCI device. The guest
> > > would
> > > put them in a bonded pair. Before migration the PCI device is unplugged &
> > > a
> > > new PCI device plugged on target after migration. The guest traffic
> > > continues
> > > without interuption due to the emulate device.
> > >
> > > This kind of conceptual approach can already be implemented today by
> > > management
> > > apps. The only hard problem that exists today is how the guest OS can
> > > figure
> > > out that a particular pair of devices it has are intended to be used
> > > together.
> > >
> > > With this series, IIUC, the virtio-net device is getting a given property
> > > which
> > > defines the qdev ID of the associated VFIO device. When the guest OS
> > > activates
> > > the virtio-net device and acknowledges the STANDBY feature bit, qdev then
> > > unhides the associated VFIO device.
> > >
> > > AFAICT the guest has to infer that the device which suddenly appears is
> > > the one
> > > associated with the virtio-net device it just initialized, for purposes of
> > > setting up the NIC bonding. There doesn't appear to be any explicit
> > > assocation
> > > between the devices exposed to the guest.
> > >
> > > This feels pretty fragile for a guest needing to match up devices when
> > > there
> > > are many pairs of devices exposed to a single guest.
> > The impression I get from
> > linux.git:Documentation/networking/net_failover.rst
> > is that the matching is done based on the primary/standby NICs having
> > the same MAC address. In theory you pass both to a guest and based on
> > MAC it essentially does automatic, and if you additionally add STANDBY
> > it'll know to use a virtio-net device specifically for failover.
> > None of this requires any sort of hiding/plugging of devices from
> > QEMU/libvirt (except for the VFIO unplug we'd need to initiate live
> > migration
> > and the VFIO hotplug on the other end to switch back over).
> > That simplifies things greatly, but also introduces the problem of how
> > an older guest will handle seeing 2 NICs with the same MAC, which IIUC
> > is why this series is looking at hotplugging the VFIO device only after
> > we confirm STANDBY is supported by the virtio-net device, and why it's
> > being done transparent to management.
> > >
> > > Unless I'm mis-reading the patches, it looks like the VFIO device always
> > > has
> > > to be available at the time QEMU is started. There's no way to boot a
> > > guest
> > > and then later hotplug a VFIO device to accelerate the existing
> > > virtio-net NIC.
> > > Or similarly after migration there might not be any VFIO device available
> > > initially when QEMU is started to accept the incoming migration. So it
> > > might
> > > need to run in degraded mode for an extended period of time until one
> > > becomes
> > > available for hotplugging. The use of qdev IDs makes this troublesome, as
> > > the
> > > qdev ID of the future VFIO device would need to be decided upfront before
> > > it
> > > even exists.
> > >
> > > So overall I'm not really a fan of the dynamic hiding/unhiding of
> > > devices. I
> > > would much prefer to see some way to expose an explicit relationship
> > > between
> > > the devices to the guest.
> > If we place the burden of determining whether the guest supports STANDBY
> > on the part of users/management, a lot of this complexity goes away. For
> > instance, one possible implementation is to simply fail migration and say
> > "sorry your VFIO device is still there" if the VFIO device is still around
> > at the start of migration (whether due to unplug failure or a
> > user/management forgetting to do it manually beforehand).
> > So how important is it that setting F_STANDBY cap doesn't break older
> > guests? If the idea is to support live migration with VFs then aren't
> > we still dead in the water if the guest boots okay but doesn't have
> > the requisite functionality to be migrated later? Shouldn't that all
> > be sorted out as early as possible? Is a very clear QEMU error message
> > in this case insufficient?
> > And if backward compatibility is important, are there alternative
> > approaches? Like maybe starting off with a dummy MAC and switching over
> > to the duplicate MAC only after F_STANDBY is negotiated? In that case
> > we could still warn users/management about it but still have the guest
> > be otherwise functional.
> Relying on F_STANDBY negotiation to decide whether to activate the VFIO
> device is a bad idea. PCI devices are precious, so if the guest OS does
> not support this standby feature, we must never add the VFIO device to
> QEMU in the first place.
> We have the libosinfo project which provides metadata on what features
> different guest OS versions support. This can be used to indicate whether
> a guest OS version supports the standby NIC concept and thus avoid needing
> to allocate PCI devices to guests that will never use them.
> F_STANDBY is still useful as a flag to inform the guest OS that it should
> pair up NICs with identical MACs, as opposed to configuring them separately.
> It shouldn't be used to show/hide the device though, we should simply never
> add the 2nd device if we know it won't be used by a given guest OS version.
The two mechanisms are not exclusive. Not wasting a PCI device
if the guest OS won't use it is a good idea. Making the guest
behave gracefully even when an older driver is loaded is also