qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] live migration vs device assignment (motivation)


From: Lan, Tianyu
Subject: Re: [Qemu-devel] live migration vs device assignment (motivation)
Date: Thu, 10 Dec 2015 22:23:56 +0800
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0

On 12/10/2015 4:38 PM, Michael S. Tsirkin wrote:
Let's assume you do save state and do have a way to detect
whether state matches a given hardware. For example,
driver could store firmware and hardware versions
in the state, and then on destination, retrieve them
and compare. It will be pretty common that you have a mismatch,
and you must not just fail migration. You need a way to recover,
maybe with more downtime.


Second, you can change the driver but you can not be sure it will have
the chance to run at all. Host overload is a common reason to migrate
out of the host.  You also can not trust guest to do the right thing.
So how long do you want to wait until you decide guest is not
cooperating and kill it?  Most people will probably experiment a bit and
then add a bit of a buffer. This is not robust at all.

Again, maybe you ask driver to save state, and if it does
not respond for a while, then you still migrate,
and driver has to recover on destination.


With the above in mind, you need to support two paths:
1. "good path": driver stores state on source, checks it on destination
    detects a match and restores state into the device
2. "bad path": driver does not store state, or detects a mismatch
    on destination. driver has to assume device was lost,
    and reset it

So what I am saying is, implement bad path first. Then good path
is an optimization - measure whether it's faster, and by how much.


These sound reasonable. Driver should have ability to do such check
to ensure hardware or firmware coherence after migration and reset device when migration happens at some unexpected position.


Also, it would be nice if on the bad path there was a way
to switch to another driver entirely, even if that means
a bit more downtime. For example, have a way for driver to
tell Linux it has to re-do probing for the device.

Just glace the code of device core. device_reprobe() does what you said.

/**
 * device_reprobe - remove driver for a device and probe for a new      driver
 * @dev: the device to reprobe
 *
 * This function detaches the attached driver (if any) for the given
 * device and restarts the driver probing process.  It is intended
 * to use if probing criteria changed during a devices lifetime and
 * driver attachment should change accordingly.
 */
int device_reprobe(struct device *dev)








reply via email to

[Prev in Thread] Current Thread [Next in Thread]