qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 00/41] postcopy live migration


From: Isaku Yamahata
Subject: Re: [Qemu-devel] [PATCH v2 00/41] postcopy live migration
Date: Mon, 4 Jun 2012 22:38:16 +0900
User-agent: Mutt/1.5.19 (2009-01-05)

On Mon, Jun 04, 2012 at 08:37:04PM +0800, Anthony Liguori wrote:
> On 06/04/2012 05:57 PM, Isaku Yamahata wrote:
>> After the long time, we have v2. This is qemu part.
>> The linux kernel part is sent separatedly.
>>
>> Changes v1 ->  v2:
>> - split up patches for review
>> - buffered file refactored
>> - many bug fixes
>>    Espcially PV drivers can work with postcopy
>> - optimization/heuristic
>>
>> Patches
>> 1 - 30: refactoring exsiting code and preparation
>> 31 - 37: implement postcopy itself (essential part)
>> 38 - 41: some optimization/heuristic for postcopy
>>
>> Intro
>> =====
>> This patch series implements postcopy live migration.[1]
>> As discussed at KVM forum 2011, dedicated character device is used for
>> distributed shared memory between migration source and destination.
>> Now we can discuss/benchmark/compare with precopy. I believe there are
>> much rooms for improvement.
>>
>> [1] http://wiki.qemu.org/Features/PostCopyLiveMigration
>>
>>
>> Usage
>> =====
>> You need load umem character device on the host before starting migration.
>> Postcopy can be used for tcg and kvm accelarator. The implementation depend
>> on only linux umem character device. But the driver dependent code is split
>> into a file.
>> I tested only host page size == guest page size case, but the implementation
>> allows host page size != guest page size case.
>>
>> The following options are added with this patch series.
>> - incoming part
>>    command line options
>>    -postcopy [-postcopy-flags<flags>]
>>    where flags is for changing behavior for benchmark/debugging
>>    Currently the following flags are available
>>    0: default
>>    1: enable touching page request
>>
>>    example:
>>    qemu -postcopy -incoming tcp:0:4444 -monitor stdio -machine accel=kvm
>>
>> - outging part
>>    options for migrate command
>>    migrate [-p [-n] [-m]] URI [<prefault forward>  [<prefault backword>]]
>>    -p: indicate postcopy migration
>>    -n: disable background transferring pages: This is for benchmark/debugging
>>    -m: move background transfer of postcopy mode
>>    <prefault forward>: The number of forward pages which is sent with 
>> on-demand
>>    <prefault backward>: The number of backward pages which is sent with
>>                         on-demand
>>
>>    example:
>>    migrate -p -n tcp:<dest ip address>:4444
>>    migrate -p -n -m tcp:<dest ip address>:4444 32 0
>>
>>
>> TODO
>> ====
>> - benchmark/evaluation. Especially how async page fault affects the result.
>
> I don't mean to beat on a dead horse, but I really don't understand the 
> point of postcopy migration other than the fact that it's possible.  It's 
> a lot of code and a new ABI in an area where we already have too much 
> difficulty maintaining our ABI.
>
> Without a compelling real world case with supporting benchmarks for why 
> we need postcopy and cannot improve precopy, I'm against merging this.

Some new results are available at 
https://events.linuxfoundation.org/images/stories/pdf/lcjp2012_yamahata_postcopy.pdf

precopy assumes that the network bandwidth are wide enough and
the number of dirty pages converges. But it doesn't always hold true.

- planned migration
  predictability of total migration time is important

- dynamic consolidation
  In cloud use cases, the resources of physical machine are usually
  over committed.
  When physical machine becomes over loaded, some VMs are moved to another
  physical host to balance the load.
  precopy can't move VMs promptly. compression makes things worse.

- inter data center migration
  With L2 over L3 technology, it has becoming common to create a virtual
  data center which actually spans over multi physical data centers.
  It is useful to migrate VMs over physical data centers as disaster recovery.
  The network bandwidth between DCs is narrower than LAN case. So precopy
  assumption wouldn't hold.

- In case that network bandwidth might be limited by QoS,
  precopy assumption doesn't hold.


thanks,
-- 
yamahata



reply via email to

[Prev in Thread] Current Thread [Next in Thread]