[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] postcopy migration hangs while loading virtio state
From: |
Martin Schwidefsky |
Subject: |
Re: [Qemu-devel] postcopy migration hangs while loading virtio state |
Date: |
Tue, 4 Jul 2017 10:16:43 +0200 |
On Tue, 4 Jul 2017 09:48:11 +0200
Christian Borntraeger <address@hidden> wrote:
> On 07/03/2017 09:07 PM, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (address@hidden) wrote:
> >> On Fri, Jun 30, 2017 at 05:31:39PM +0100, Dr. David Alan Gilbert wrote:
> >>> * Christian Borntraeger (address@hidden) wrote:
> >>>> On 04/26/2017 01:45 PM, Christian Borntraeger wrote:
> >>>>
> >>>>>> Hmm, I have a theory, if the flags field has bit 1 set, i.e.
> >>>>>> RAM_SAVE_FLAG_COMPRESS
> >>>>>> then try changing ram_handle_compressed to always do the memset.
> >>>>>
> >>>>> FWIW, changing ram_handle_compressed to always memset makes the problem
> >>>>> go away.
> >>>>
> >>>> It is still running fine now with the "always memset change"
> >>>
> >>> Did we ever nail down a fix for this; as I remember Andrea said
> >>> we shouldn't need to do that memset, but we came to the conclusion
> >>> it was something specific to how s390 protection keys worked.
> >>>
> >>> Dave
> >>
> >> No we didn't. Let's merge that for now, with a comment that
> >> we don't really understand what's going on?
> >
> > Hmm no, I don't really want to change the !s390 behaviour, especially
> > since it causes allocation that we otherwise avoid and Andrea's
> > reply tothe original post pointed out it's not necessary.
>
>
> Since storage keys are per physical page we must not have shared pages.
> Therefore in s390_enable_skey we already do a break_ksm (via ksm_madvise),
> in other words we already allocate pages on the keyless->keyed switch.
>
> So why not do the same for zero pages - instead of invalidating the page
> table entry, lets just do a proper COW.
>
> Something like this maybe (Andrea, Martin any better way to do that?)
>
>
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 4fb3d3c..11475c7 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -2149,13 +2149,18 @@ EXPORT_SYMBOL_GPL(s390_enable_sie);
> static int __s390_enable_skey(pte_t *pte, unsigned long addr,
> unsigned long next, struct mm_walk *walk)
> {
> + struct page *page;
> /*
> - * Remove all zero page mappings,
> + * Remove all zero page mappings with a COW
> * after establishing a policy to forbid zero page mappings
> * following faults for that page will get fresh anonymous pages
> */
> - if (is_zero_pfn(pte_pfn(*pte)))
> - ptep_xchg_direct(walk->mm, addr, pte, __pte(_PAGE_INVALID));
> + if (is_zero_pfn(pte_pfn(*pte))) {
> + if (get_user_pages(addr, 1, FOLL_WRITE, &page, NULL) == 1)
> + put_page(page);
> + else
> + return -ENOMEM;
> + }
> /* Clear storage key */
> ptep_zap_key(walk->mm, addr, pte);
> return 0;
I do not quite get the problem here. The zero-page mappings are always
marked as _PAGE_SPECIAL. These should be safe to replace with an empty
pte. We do mark all VMAs as unmergeable prior to the page table walk
that replaces all zero-page mappings. What is the get_user_pages() call
supposed to do?
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.