qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] post-copy is broken?


From: Li, Liang Z
Subject: Re: [Qemu-devel] post-copy is broken?
Date: Thu, 28 Apr 2016 02:59:38 +0000

> -----Original Message-----
> From: Andrea Arcangeli [mailto:address@hidden
> Sent: Wednesday, April 27, 2016 10:48 PM
> To: Li, Liang Z
> Cc: Dr. David Alan Gilbert; Kirill A. Shutemov; address@hidden;
> Amit Shah; address@hidden; address@hidden; linux-
> address@hidden
> Subject: Re: post-copy is broken?
> 
> Hello Liang,
> 
> On Mon, Apr 18, 2016 at 10:33:14AM +0000, Li, Liang Z wrote:
> > If the THP is disabled, no fails.
> > And your test was always passed, even when  real post-copy was failed.
> >
> > In my env, the output of
> > 'cat /sys/kernel/mm/transparent_hugepage/enabled'  is:
> >
> >  [always] ...
> >
> 
> Can you test the fix?
> https://marc.info/?l=linux-mm&m=146175869123580&w=2
> 
> This was not a breakage in userfaultfd nor in postcopy. userfaultfd had no
> bugs and is fully rock solid and with zero chances of generating undetected
> memory corruption like it was happening in v4.5.
> 
> As I suspected, the same problem would have happened with any THP
> pmd_trans_huge split (swapping/inflating-balloon etc..). Postcopy just
> makes it easier to reproduce the problem because it does a scattered
> MADV_DONTNEED on the destination qemu guest memory for the pages
> redirtied during the last precopy pass that run, or not transferred (to allow
> THP faults in destination qemu during precopy), just before starting the
> guest in the destination node.
> 
> Other reports of KVM memory corruption happening on v4.5 with THP
> enabled will also be taken care of by the above fix.
> 
> I hope I managed to fix this in time for v4.6 final (current is v4.6-rc5-69), 
> so
> the only kernel where KVM must not be used with THP enabled will be v4.5.
> 
> On a side note, this MADV_DONTEED trigger reminded me as soon as the
> madvisev syscall is merged, loadvm_postcopy_ram_handle_discard should
> start using it to reduce the enter/exit kernel to just 1 (or a few madvisev in
> case we want to give a limit to the temporary buffer to avoid the risk of
> allocating too much temporary RAM for very large
> guests) to do the MADV_DONTNEED scattered zapping. Same thing in
> virtio_balloon_handle_output.
> 

I have test the patch, guest doesn't crash anymore, I think the issue is fixed. 
Thanks!

Liang
> Thanks,
> Andrea



reply via email to

[Prev in Thread] Current Thread [Next in Thread]