qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] postcopy livemigration proposal


From: Anthony Liguori
Subject: Re: [Qemu-devel] [RFC] postcopy livemigration proposal
Date: Mon, 08 Aug 2011 16:42:33 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110516 Lightning/1.0b2 Thunderbird/3.1.10

On 08/08/2011 04:40 AM, Yaniv Kaul wrote:
On 08/08/2011 12:20, Dor Laor wrote:
On 08/08/2011 06:24 AM, Isaku Yamahata wrote:

Design/Implementation
=====================
The basic idea of postcopy livemigration is to use a sort of distributed
shared memory between the migration source and destination.

The migration procedure looks like
- start migration
stop the guest VM on the source and send the machine states except
guest RAM to the destination
- resume the guest VM on the destination without guest RAM contents
- Hook guest access to pages, and pull page contents from the source
This continues until all the pages are pulled to the destination

The big picture is depicted at
http://wiki.qemu.org/File:Postcopy-livemigration.png

That's terrific (nice video also)!
Orit and myself had the exact same idea too (now we can't patent it..).

Advantages:
- No down time due to memory copying.
- Efficient, reduce needed traffic no need to re-send pages.
- Reduce overall RAM consumption of the source and destination
as opposed from current live migration (both the source and the
destination allocate the memory until the live migration
completes). We can free copied memory once the destination guest
received it and save RAM.
- Increase parallelism for SMP guests we can have multiple
virtual CPU handle their demand paging . Less time to hold a
global lock, less thread contention.
- Virtual machines are using more and more memory resources ,
for a virtual machine with very large working set doing live
migration with reasonable down time is impossible today.

Disadvantageous:
- During the live migration the guest will run slower than in
today's live migration. We need to remember that even today
guests suffer from performance penalty on the source during the
COW stage (memory copy).
- Failure of the source or destination or the network will cause
us to lose the running virtual machine. Those failures are very
rare.

I highly doubt that's acceptable in enterprise deployments.

I don't think you can make blanket statements about enterprise deployments.

A lot of enterprises are increasingly building fault tolerance into their applications expecting that the underlying hardware will fail. With cloud environments like EC2 that experience failure on a pretty regular basis, this is just becoming all the more common.

So I really don't view this as a critical issue. It certainly would be if it were the only mechanism available but as long as we can also support pre-copy migration it would be fine.

Regards,

Anthony Liguori



reply via email to

[Prev in Thread] Current Thread [Next in Thread]