[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] post-copy is broken?
From: |
Kirill A. Shutemov |
Subject: |
Re: [Qemu-devel] post-copy is broken? |
Date: |
Fri, 15 Apr 2016 15:52:36 +0300 |
User-agent: |
Mutt/1.5.23.1 (2014-03-12) |
On Thu, Apr 14, 2016 at 12:22:30PM -0400, Andrea Arcangeli wrote:
> Adding linux-mm too,
>
> On Thu, Apr 14, 2016 at 01:34:41PM +0100, Dr. David Alan Gilbert wrote:
> > * Andrea Arcangeli (address@hidden) wrote:
> >
> > > The next suspect is the massive THP refcounting change that went
> > > upstream recently:
> >
> > > As further debug hint, can you try to disable THP and see if that
> > > makes the problem go away?
> >
> > Yep, this seems to be the problem (cc'ing in Kirill).
> >
> > 122afea9626ab3f717b250a8dd3d5ebf57cdb56c - works (just before Kirill
> > disables THP)
> > 61f5d698cc97600e813ca5cf8e449b1ea1c11492 - breaks (when THP is reenabled)
> >
> > It's pretty reliable; as you say disabling THP makes it work again
> > and putting it back to THP/madvise mode makes it break. And you need
> > to test on a machine with some free ram to make sure THP has a chance
> > to have happened.
> >
> > I'm not sure of all of the rework that happened in that series,
> > but my reading of it is that splitting of THP pages gets deferred;
> > so I wonder if when I do the madvise to turn THP off, if it's actually
> > still got THP pages and thus we end up with a whole THP mapped
> > when I'm expecting to be userfaulting those pages.
>
> Good thing at least I didn't make UFFDIO_COPY THP aware yet so there's
> less variables (as no user was interested to handle userfaults at THP
> granularity yet, and from userland such an improvement would be
> completely invisible in terms of API, so if an user starts doing that
> we can just optimize the kernel for it, criu restore could do that as
> the faults will come from disk-I/O, when network is involved THP
> userfaults wouldn't have a great tradeoff with regard to the increased
> fault latency).
>
> I suspect there is an handle_userfault missing somewhere in connection
> with trans_huge_pmd splits (not anymore THP splits) that you're doing
> with MADV_DONTNEED to zap those pages in the destination that got
> redirtied in source during the last precopy stage. Or more simply
> MADV_DONTNEED isn't zapping all the right ptes after the trans huge
> pmd got splitted.
>
> The fact the page isn't splitted shouldn't matter too much, all we care
> about is the pte triggers handle_userfault after MADV_DONTNEED.
>
> The userfaultfd testcase in the kernel isn't exercising this case
> unfortunately, that should probably be improved too, so there is a
> simpler way to reproduce than running precopy before postcopy in qemu.
I've tested current Linus' tree and v4.5 using qemu postcopy test case for
both x86-64 and i386 and it never failed for me:
/x86_64/postcopy: first_byte = 7e last_byte = 7d hit_edge = 1 OK
OK
/i386/postcopy: first_byte = f6 last_byte = f5 hit_edge = 1 OK
OK
I've run it directly, setting relevant QTEST_QEMU_BINARY.
--
Kirill A. Shutemov
- Re: [Qemu-devel] post-copy is broken?, (continued)
- Re: [Qemu-devel] post-copy is broken?, Dr. David Alan Gilbert, 2016/04/12
- Re: [Qemu-devel] post-copy is broken?, Li, Liang Z, 2016/04/12
- Re: [Qemu-devel] post-copy is broken?, Li, Liang Z, 2016/04/12
- Re: [Qemu-devel] post-copy is broken?, Dr. David Alan Gilbert, 2016/04/13
- Re: [Qemu-devel] post-copy is broken?, Dr. David Alan Gilbert, 2016/04/13
- Re: [Qemu-devel] post-copy is broken?, Dr. David Alan Gilbert, 2016/04/13
- Re: [Qemu-devel] post-copy is broken?, Andrea Arcangeli, 2016/04/13
- Re: [Qemu-devel] post-copy is broken?, Dr. David Alan Gilbert, 2016/04/14
- Re: [Qemu-devel] post-copy is broken?, Dr. David Alan Gilbert, 2016/04/14
- Re: [Qemu-devel] post-copy is broken?, Andrea Arcangeli, 2016/04/14
- Re: [Qemu-devel] post-copy is broken?,
Kirill A. Shutemov <=
- Re: [Qemu-devel] post-copy is broken?, Dr. David Alan Gilbert, 2016/04/15
- Re: [Qemu-devel] post-copy is broken?, Kirill A. Shutemov, 2016/04/15
- Re: [Qemu-devel] post-copy is broken?, Dr. David Alan Gilbert, 2016/04/15
- Re: [Qemu-devel] post-copy is broken?, Li, Liang Z, 2016/04/18
- Re: [Qemu-devel] post-copy is broken?, Dr. David Alan Gilbert, 2016/04/18
- Re: [Qemu-devel] post-copy is broken?, Li, Liang Z, 2016/04/18
- Re: [Qemu-devel] post-copy is broken?, Dr. David Alan Gilbert, 2016/04/18
- Re: [Qemu-devel] post-copy is broken?, Li, Liang Z, 2016/04/18
- Re: [Qemu-devel] post-copy is broken?, Dr. David Alan Gilbert, 2016/04/18
- Re: [Qemu-devel] post-copy is broken?, Dr. David Alan Gilbert, 2016/04/18