Re: [PATCH 18/20] migration: Postcopy preemption enablement

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 18/20] migration: Postcopy preemption enablement

From:	Peter Xu
Subject:	Re: [PATCH 18/20] migration: Postcopy preemption enablement
Date:	Wed, 23 Feb 2022 21:05:34 +0800

On Wed, Feb 23, 2022 at 09:56:08AM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Tue, Feb 22, 2022 at 10:52:23AM +0000, Dr. David Alan Gilbert wrote:
> > > This does get a bit complicated, which worries me a bit; the code here
> > > is already quite complicated.
> > 
> > Right, it's the way I chose in this patchset on solving this problem.  Not
> > sure whether there's any better and easier way.
> > 
> > For example, we could have used a new thread to send requested pages, and
> > synchronize it with the main thread.  But that'll need other kind of
> > complexity, and I can't quickly tell whether that'll be better.
> > 
> > For this single patch, more than half of the complexity comes from the
> > ability to interrupt sending one huge page half-way.  It's a bit of a pity
> > that, that part will be noop ultimately when with doublemap.
> 
> How does that huge-page interruption interact with recovery?
> i.e. do we know the start of that hugepage arrived?

That's a great question..  I should have mentioned that but I forgot.

When postcopy is interrupted during sending a huge page, the dest QEMU will
not be able to do the UFFDIO_COPY of that huge page (because it lacks
data!) then it also means the received bitmap of that huge page will be
completely cleared.

So when recover happens, the dest QEMU will tell the source about this fact
("Hey this huge page has never transferred", even if it actually has
transferred a few small pages already!).  Then the whole huge page will be
resent.

When postcopy preempt joins the equation, what we need to do is to reset
the temp huge pages (postcopy_pause_incoming()):

    /*
     * If network is interrupted, any temp page we received will be useless
     * because we didn't mark them as "received" in receivedmap.  After a
     * proper recovery later (which will sync src dirty bitmap with receivedmap
     * on dest) these cached small pages will be resent again.
     */
    for (i = 0; i < mis->postcopy_channels; i++) {
        postcopy_temp_page_reset(&mis->postcopy_tmp_pages[i]);
    }

This chunk of code lies in "migration: Introduce postcopy channels on dest
node" but not in the recovery patch, I think that's the major reason why
it's easily overlooked.  However it needs to be there to not break existing
postcopy.

So that's kind of hidden in the past because we don't manage the temp huge
pages explicitly (they used to be local vars, so get reset automatically),
but now we need to do that by hand.

> 
> > 
> > However I kept those only because we don't know when doublemap will be
> > ready, not to say, landing.  Meanwhile we can't assume all kernels will
> > have doublemap even in the future.
> 
> Yeh, if doublemap was already here you could make it a condition of
> allowing you to set the option.

Right.  We'll 100% skip the huge page interruption, just like when the
ramblock is using PAGE_SIZE small pages.

-- 
Peter Xu

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH 15/20] migration: Allow migrate-recover to run multiple times, (continued)
- [PATCH 16/20] migration: Add postcopy-preempt capability, Peter Xu, 2022/02/16
- [PATCH 17/20] migration: Postcopy preemption preparation on channel creation, Peter Xu, 2022/02/16
  - Re: [PATCH 17/20] migration: Postcopy preemption preparation on channel creation, Dr. David Alan Gilbert, 2022/02/21
    - Re: [PATCH 17/20] migration: Postcopy preemption preparation on channel creation, Peter Xu, 2022/02/22
    - Re: [PATCH 17/20] migration: Postcopy preemption preparation on channel creation, Dr. David Alan Gilbert, 2022/02/22
- [PATCH 18/20] migration: Postcopy preemption enablement, Peter Xu, 2022/02/16
  - Re: [PATCH 18/20] migration: Postcopy preemption enablement, Dr. David Alan Gilbert, 2022/02/22
    - Re: [PATCH 18/20] migration: Postcopy preemption enablement, Peter Xu, 2022/02/23
    - Re: [PATCH 18/20] migration: Postcopy preemption enablement, Dr. David Alan Gilbert, 2022/02/23
    - Re: [PATCH 18/20] migration: Postcopy preemption enablement, Peter Xu <=
- [PATCH 19/20] migration: Postcopy recover with preempt enabled, Peter Xu, 2022/02/16
  - Re: [PATCH 19/20] migration: Postcopy recover with preempt enabled, Dr. David Alan Gilbert, 2022/02/22
    - Re: [PATCH 19/20] migration: Postcopy recover with preempt enabled, Peter Xu, 2022/02/23
    - Re: [PATCH 19/20] migration: Postcopy recover with preempt enabled, Dr. David Alan Gilbert, 2022/02/23
    - Re: [PATCH 19/20] migration: Postcopy recover with preempt enabled, Peter Xu, 2022/02/23
    - Re: [PATCH 19/20] migration: Postcopy recover with preempt enabled, Dr. David Alan Gilbert, 2022/02/23
- [PATCH 20/20] tests: Add postcopy preempt test, Peter Xu, 2022/02/16
  - Re: [PATCH 20/20] tests: Add postcopy preempt test, Dr. David Alan Gilbert, 2022/02/22
    - Re: [PATCH 20/20] tests: Add postcopy preempt test, Peter Xu, 2022/02/23
- Re: [PATCH 00/20] migration: Postcopy Preemption, Peter Xu, 2022/02/16

Prev by Date: Re: [PULL 1/2] tpm: CRB: Use ram_device for "tpm-crb-cmd" region
Next by Date: [PATCH RFC v1 0/2] VM fork detection for RNG
Previous by thread: Re: [PATCH 18/20] migration: Postcopy preemption enablement
Next by thread: [PATCH 19/20] migration: Postcopy recover with preempt enabled
Index(es):
- Date
- Thread