qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] postcopy livemigration proposal


From: Nadav Har'El
Subject: Re: [Qemu-devel] [RFC] postcopy livemigration proposal
Date: Mon, 8 Aug 2011 13:59:10 +0300
User-agent: Mutt/1.4.2.2i

> >* What's is postcopy livemigration
> >It is is yet another live migration mechanism for Qemu/KVM, which
> >implements the migration technique known as "postcopy" or "lazy"
> >migration. Just after the "migrate" command is invoked, the execution
> >host of a VM is instantaneously switched to a destination host.

Sounds like a cool idea.

> >The benefit is, total migration time is shorter because it transfer
> >a page only once. On the other hand precopy may repeat sending same pages
> >again and again because they can be dirtied.
> >The switching time from the source to the destination is several
> >hunderds mili seconds so that it enables quick load balancing.
> >For details, please refer to the papers.

While these are the obvious benefits, the possible downside (that, as
always, depends on the workload) is the amount of time that the guest
workload runs more slowly than usual, waiting for pages it needs to
continue. There are a whole spectrum between the guest pausing completely
(which would solve all the problems of migration, but is often considered
unacceptible) and running at full-speed. Is it acceptable that the guest
runs at 90% speed during the migration? 50%? 10%?
I guess we could have nothing to lose from having both options, and choosing
the most appropriate technique for each guest!

> That's terrific  (nice video also)!
> Orit and myself had the exact same idea too (now we can't patent it..).

I think new implementation is not the only reason why you cannot patent
this idea :-) Demand-paged migration has actually been discussed (and done)
for nearly a quarter of a century (!) in the area of *process* migration.

The first use I'm aware of was in CMU's Accent 1987 - see [1].
Another paper, [2], written in 1991, discusses how process migration is done
in UCB's Sprite operating system, and evaluates the various alternatives
common at the time (20 years ago), including what it calls "lazy copying"
is more-or-less the same thing as "post copy". Mosix (a project which, in some
sense, is still alive to day) also used some sort of cross between pre-copying
(of dirty pages) and copying on-demand of clean pages (from their backing
store on the source machine).


References
[1] "Attacking the Process Migration Bottleneck"
     http://www.nd.edu/~dthain/courses/cse598z/fall2004/papers/accent.pdf
[2]  "Transparent Process Migration: Design Alternatives and the Sprite
     Implementation"
     http://nd.edu/~dthain/courses/cse598z/fall2004/papers/sprite-migration.pdf

> Advantages:
>         - Virtual machines are using more and more memory resources ,
>         for a virtual machine with very large working set doing live
>         migration with reasonable down time is impossible today.

If a guest actually constantly uses (working set) most of its allocated
memory, it will basically be unable to do any significant amount of work
on the destination VM until this large working set is transfered to the
destination. So in this scenario, "post copying" doesn't give any
significant advantages over plain-old "pause guest and send it to the
destination". Or am I missing something?

> Disadvantageous:
>         - During the live migration the guest will run slower than in
>         today's live migration. We need to remember that even today
>         guests suffer from performance penalty on the source during the
>         COW stage (memory copy).

I wonder if something like asynchronous page faults can help somewhat with
multi-process guest workloads (and modified (PV) guest OS). 

>         - Failure of the source or destination or the network will cause
>         us to lose the running virtual machine. Those failures are very
>         rare.

How is this different from a VM running on a single machine that fails?
Just that the small probability of failure (roughly) doubles for the
relatively-short duration of the transfer?


-- 
Nadav Har'El                        |           Monday, Aug  8 2011, 8 Av 5771
address@hidden             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |If glory comes after death, I'm not in a
http://nadav.harel.org.il           |hurry. (Latin proverb)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]