qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v13 0/5] UFFD write-tracking migration/snapshots


From: Andrey Gruzdev
Subject: Re: [PATCH v13 0/5] UFFD write-tracking migration/snapshots
Date: Thu, 11 Feb 2021 21:15:53 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

On 11.02.2021 20:18, Peter Xu wrote:
On Thu, Feb 11, 2021 at 12:21:51PM +0300, Andrey Gruzdev wrote:
On 09.02.2021 23:31, Peter Xu wrote:
On Tue, Feb 09, 2021 at 03:09:28PM -0500, Peter Xu wrote:
Hi, David, Andrey,

On Tue, Feb 09, 2021 at 08:06:58PM +0100, David Hildenbrand wrote:
Hi,

just stumbled over this, quick question:

I recently played with UFFD_WP and notices that write protection is
only effective on pages/ranges that have already pages populated (IOW:
!pte_none() in the kernel).

In case memory was never populated (or was discarded using e.g.,
madvice(DONTNEED)), write-protection will be skipped silently and you
won't get WP events for applicable pages.

So if someone writes to a yet unpoupulated page ("zero"), you won't
get WP events.

I can spot that you do a single uffd_change_protection() on the whole
RAMBlock.

How are you handling that scenario, or why don't you have to handle
that scenario?
Good catch..  Indeed I overlooked that as well when reviewing the code.

Hi David,

I really wonder if such a problem exists.. If we are talking about a
I immediately ran into this issue with my simplest test cases. :)

write to an unpopulated page, we should get first page fault on
non-present page and populate it with protection bits from respective vma.
For UFFD_WP vma's  page will be populated non-writable. So we'll get
another page fault on present but read-only page and go to handle_userfault.
The problem is even if the page is read-only, it does not yet have the uffd-wp
bit set, so it won't really trigger the handle_userfault() path.

You might have to register also for MISSING faults and place zero pages.
So I think what's missing for live snapshot is indeed to register with both
missing & wp mode.

Then we'll receive two messages: For wp, we do like before.  For missing, we do
UFFDIO_ZEROCOPY and at the same time dump this page as a zero page.

I bet live snapshot didn't encounter this issue simply because normal live
snapshots would still work, especially when there's the guest OS. Say, the
worst case is we could have migrated some zero pages with some random data
filled in along with the snapshot, however all these pages were zero pages and
not used by the guest OS after all, then when we load a snapshot we won't
easily notice either..
I'm thinking some way to verify this from live snapshot pov, and I've got an
idea so I just share it out...  Maybe we need a guest application that does
something like below:

   - mmap() a huge lot of memory

   - call mlockall(), so that pages will be provisioned in the guest but without
     data written.  IIUC on the host these pages should be backed by missing
     pages as long as guest app doesn't write.  Then...

   - the app starts to read input from user:

     - If user inputs "dirty" and enter: it'll start to dirty the whole range.
       Write non-zero to the 1st byte of each page would suffice.

     - If user inputs "check" and enter: it'll read the whole memory chunk to
       see whether all the pages are zero pages.  If it reads any non-zero page,
       it should bail out and print error.

With the help of above program, we can do below to verify the live snapshot
worked as expected on zero pages:

   - Guest: start above program, don't input yet (so waiting to read either
     "dirty" or "check" command)

   - Host: start live snapshot

   - Guest: input "dirty" command, so start quickly dirtying the ram

   - Host: live snapshot completes

Then to verify the snapshot image, we do:

   - Host: load the snapshot we've got

   - Guest: (should still be in the state of waiting for cmd) this time we enter
     "check"

Thanks,

Hi David, Peter,

A little unexpected behavior, from my point of view, for UFFD write-protection.
So, that means that UFFD_WP protection/events works only for locked memory?
I'm now looking at kernel implementation, to understand..
Not really; it definitely works for all memories that we've touched.  My
previous exmaple wanted to let the guest app use a not-yet-allocated page.  I
figured mlockall() might achieve that, hence I proposed such an example
assuming that may verify the zero page issue on live snapshot.  So if my
understanding is correct, if we run above scenario, current live snapshot might
fail that app when we do the "check" command at last, by finding non-zero pages.

Thanks,

Yes, I understand the limitations with vma's which lead to the fact we can write-protect with PTE softbits only.
I think mlockall() is not required, just need mmap() with MAP_POPULATE. Since the problem is related
only to pte_none() entries.


-- 
Andrey Gruzdev, Principal Engineer
Virtuozzo GmbH  +7-903-247-6397
                virtuzzo.com

reply via email to

[Prev in Thread] Current Thread [Next in Thread]