|
| From: | Maciej S. Szmigiero |
| Subject: | Re: [PATCH RFC 00/26] Multifd π device state transfer support with VFIO consumer |
| Date: | Wed, 24 Apr 2024 00:25:08 +0200 |
| User-agent: | Mozilla Thunderbird |
On 24.04.2024 00:20, Peter Xu wrote:
On Tue, Apr 23, 2024 at 06:15:35PM +0200, Maciej S. Szmigiero wrote:On 19.04.2024 17:31, Peter Xu wrote:On Fri, Apr 19, 2024 at 11:07:21AM +0100, Daniel P. BerrangΓ© wrote:On Thu, Apr 18, 2024 at 04:02:49PM -0400, Peter Xu wrote:On Thu, Apr 18, 2024 at 08:14:15PM +0200, Maciej S. Szmigiero wrote:I think one of the reasons for these results is that mixed (RAM + device state) multifd channels participate in the RAM sync process (MULTIFD_FLAG_SYNC) whereas device state dedicated channels don't.Firstly, I'm wondering whether we can have better names for these new hooks. Currently (only comment on the async* stuff): - complete_precopy_async - complete_precopy - complete_precopy_async_wait But perhaps better: - complete_precopy_begin - complete_precopy - complete_precopy_end ? As I don't see why the device must do something with async in such hook. To me it's more like you're splitting one process into multiple, then begin/end sounds more generic. Then, if with that in mind, IIUC we can already split ram_save_complete() into >1 phases too. For example, I would be curious whether the performance will go back to normal if we offloading multifd_send_sync_main() into the complete_precopy_end(), because we really only need one shot of that, and I am quite surprised it already greatly affects VFIO dumping its own things. I would even ask one step further as what Dan was asking: have you thought about dumping VFIO states via multifd even during iterations? Would that help even more than this series (which IIUC only helps during the blackout phase)?To dump during RAM iteration, the VFIO device will need to have dirty tracking and iterate on its state, because the guest CPUs will still be running potentially changing VFIO state. That seems impractical in the general case.We already do such interations in vfio_save_iterate()? My understanding is the recent VFIO work is based on the fact that the VFIO device can track device state changes more or less (besides being able to save/load full states). E.g. I still remember in our QE tests some old devices report much more dirty pages than expected during the iterations when we were looking into such issue that a huge amount of dirty pages reported. But newer models seem to have fixed that and report much less. That issue was about GPU not NICs, though, and IIUC a major portion of such tracking used to be for GPU vRAMs. So maybe I was mixing up these, and maybe they work differently.The device which this series was developed against (Mellanox ConnectX-7) is already transferring its live state before the VM gets stopped (via save_live_iterate SaveVMHandler). It's just that in addition to the live state it has more than 400 MiB of state that cannot be transferred while the VM is still running. And that fact hurts a lot with respect to the migration downtime. AFAIK it's a very similar story for (some) GPUs.So during iteration phase VFIO cannot yet leverage the multifd channels when with this series, am I right?
That's right.
Is it possible to extend that use case too?
I guess so, but since this phase (iteration while the VM is still running)Β doesn't impact downtime it is much less critical.
Thanks,
Thanks, Maciej
| [Prev in Thread] | Current Thread | [Next in Thread] |