qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] virtio_pmem: enable live migration support


From: Pankaj Gupta
Subject: Re: [RFC] virtio_pmem: enable live migration support
Date: Wed, 12 Jan 2022 16:44:59 +0100

Thank you David for replying!

> > From: Pankaj Gupta <pankaj.gupta.linux@gmail.com>>
> >
> > Enable live migration support for virtio-pmem device.
> > Tested this: with live migration on same host.
> >
> > Need suggestion on below points to support virtio-pmem live migration
> > between two separate host systems:
>
> I assume emulated NVDIMMs would have the exact same issue, right?
>
> There are two cases to consider I think:
>
> 1) Backing storage is migrated manually to the destination (i.e., a file
> that is copied/moved/transmitted during migration)
>
> 2) Backing storage is located on a shared network storage (i.e., a file
> that is not copied during migration)
>
> IIRC you're concerned about 2).

Yes.

>
> >
> > - There is still possibility of stale page cache page at the
> >   destination host which we cannot invalidate currently as done in 1]
> >   for write-back mode because virtio-pmem memory backend file is mmaped
> >   in guest address space and invalidating corresponding page cache pages
> >   would also fault all the other userspace process mappings on the same 
> > file.
> >   Or we make it strict no other process would mmap this backing file?
>
> I'd have assume that a simple fsync on the src once migration is about
> to switch over (e.g., pre_save/post_save handler) should be enough to
> trigger writeback to the backing storage, at which point the dst can
> take over. So handling the src is easy.
>
> So is the issue that the dst might still have stale pagecache
> information, because it already accessed some of that file previously,
> correct?

yes.

>
> >
> >   -- In commit 1] we first fsync and then invalidate all the pages from 
> > destination
> >      page cache. fsync would sync the stale dirty page cache page, Is this 
> > the right
> >      thing to do as we might end up in data discrepancy?
>
> It would be weird if
>
> a) The src used/modified the file and fsync'ed the modifications back to
>    backing storage
> b) The dst has stale dirty pagecache pages that would result in a
>    modification of backing storage during fsync()

Yes. That's what I thought currently we are doing with commit 1] maybe Stefan
can confirm. If yes, itvirg is broken as well.

>
> I mean, that would be fundamentally broken, because the fsync() would
> corrupt the file. So I assume in a sane environment, the dst could only
> have stale clean pagecache pages. And we'd have to get rid of these to
> re-read everything from file.

In case of write back cache mode, we could still have stale dirty
pages at the destination
host and destination fsync is not the right thing to do. We need to
invalidate these pages
(Can we invalidate dirty pages resident in page cache with
POSIX_FADV_DONTNEED as
well?) man pages say, we cannot (unless i misunderstood it).

>
> IIRC, an existing mmap of the file on the dst should not really be
> problematic *as long as* we didn't actually access file content that way
> and faulted in the pages. So *maybe*, if we do the POSIX_FADV_DONTNEED
> on the dst before accessing file content via the mmap, there shouldn't
> be an issue. Unless the mmap itself is already problematic.

mmap with shared=ON, might result in stale dirty page cache pages?

>
> I think we can assume that once QEMU starts on the dst and wants to mmap
> the file that it's not mapped into any other process yet. vhost-user
> will only mmap *after* being told from QEMU about the mmap region and
> the location in GPA.

maybe we have an old stale dirty page cache page even if there no mmap process
alive before mmaping virtio-pmem backend file in destination?
>
> So if the existing QEMU mmap is not problematic, it should be easy, just
> do the POSIX_FADV_DONTNEED on the destination when initializing
> virtio-pmem. If we have to POSIX_FADV_DONTNEED *before* performing the
> mmap, we might need a way to tell QEMU to POSIX_FADV_DONTNEED before
> doing the mmap. The could be a parameter for memory-backend-file like
> "flush=on", or doing that implicitly when we're told that we expect an
> incoming migration.

Yes, that's what I had in mind. Just wanted to confirm some of my
doubts for correct
implementation. As I see it, page cache coherency across multiple host
systems with
live migration needs to be addressed or used to avoid such scenarios.


Thanks,
Pankaj
>
> --
> Thanks,
>
> David / dhildenb



reply via email to

[Prev in Thread] Current Thread [Next in Thread]