qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] live migration vs device assignment (motivation)


From: Lan, Tianyu
Subject: Re: [Qemu-devel] live migration vs device assignment (motivation)
Date: Thu, 10 Dec 2015 11:04:54 +0800
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0


On 12/10/2015 4:07 AM, Michael S. Tsirkin wrote:
On Thu, Dec 10, 2015 at 12:26:25AM +0800, Lan, Tianyu wrote:
On 12/8/2015 12:50 AM, Michael S. Tsirkin wrote:
I thought about what this is doing at the high level, and I do have some
value in what you are trying to do, but I also think we need to clarify
the motivation a bit more.  What you are saying is not really what the
patches are doing.

And with that clearer understanding of the motivation in mind (assuming
it actually captures a real need), I would also like to suggest some
changes.

Motivation:
Most current solutions for migration with passthough device are based on
the PCI hotplug but it has side affect and can't work for all device.

For NIC device:
PCI hotplug solution can work around Network device migration
via switching VF and PF.

This is just more confusion. hotplug is just a way to add and remove
devices. switching VF and PF is up to guest and hypervisor.

This is a combination. Because it's not able to migrate device state in
the current world during migration(What we are doing), Exist solutions
of migrating VM with passthough NIC relies on the PCI hotplug. Unplug VF
before starting migration and then switch network from VF NIC to PV NIC
in order to maintain the network connection. Plug VF again after
migration and then switch from PV back to VF. Bond driver provides a way to switch between PV and VF NIC automatically with save IP and MAC and so bond driver is more preferred.


But switching network interface will introduce service down time.

I tested the service down time via putting VF and PV interface
into a bonded interface and ping the bonded interface during plug
and unplug VF.
1) About 100ms when add VF
2) About 30ms when del VF

OK and what's the source of the downtime?
I'm guessing that's just arp being repopulated.  So simply save and
re-populate it.

There would be a much cleaner solution.

Or maybe there's a timer there that just delays hotplug
for no reason. Fix it, everyone will benefit.

It also requires guest to do switch configuration.

That's just wrong. if you want a switch, you need to
configure a switch.

I meant the config of switching operation between PV and VF.


These are hard to
manage and deploy from our customers.

So kernel want to remain flexible, and the stack is
configurable. Downside: customers need to deploy userspace
to configure it. Your solution: a hard-coded configuration
within kernel and hypervisor.  Sorry, this makes no sense.
If kernel is easier for you to deploy than userspace,
you need to rethink your deployment strategy.

This is one factor.


To maintain PV performance during
migration, host side also needs to assign a VF to PV device. This
affects scalability.

No idea what this means.

These factors block SRIOV NIC passthough usage in the cloud service and
OPNFV which require network high performance and stability a lot.

Everyone needs performance and scalability.


For other kind of devices, it's hard to work.
We are also adding migration support for QAT(QuickAssist Technology) device.

QAT device user case introduction.
Server, networking, big data, and storage applications use QuickAssist
Technology to offload servers from handling compute-intensive operations,
such as:
1) Symmetric cryptography functions including cipher operations and
authentication operations
2) Public key functions including RSA, Diffie-Hellman, and elliptic curve
cryptography
3) Compression and decompression functions including DEFLATE and LZS

PCI hotplug will not work for such devices during migration and these
operations will fail when unplug device.

So we are trying implementing a new solution which really migrates
device state to target machine and won't affect user during migration
with low service down time.

Let's assume for the sake of the argument that there's a lot going on
and removing the device is just too slow (though you should figure out
what's going on before giving up and just building something new from
scratch).

No, we can find a PV NIC as backup for VF NIC during migration but it doesn't work for other kinds of device since there is no backup for them. E,G When migration happens during users compresses files via QAT, it's impossible to remove QAT at that point. If do that, the compress operation will fail and affect user experience.


I still don't think you should be migrating state.  That's just too
fragile, and it also means you depend on driver to be nice and shut down
device on source, so you can not migrate at will.  Instead, reset device
on destination and re-initialize it.


Yes, saving and restoring device state relies on the driver and so we reworks driver and make it more friend to migration.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]