qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-ppc] broken incoming migration


From: Peter Lieven
Subject: Re: [Qemu-ppc] broken incoming migration
Date: Thu, 30 May 2013 16:38:23 +0200



Am 30.05.2013 um 15:41 schrieb "Paolo Bonzini" <address@hidden>:

> Il 30/05/2013 11:08, Peter Lieven ha scritto:
>> Am 30.05.2013 10:18, schrieb Alexey Kardashevskiy:
>>> On 05/30/2013 05:49 PM, Paolo Bonzini wrote:
>>>> Il 30/05/2013 09:44, Alexey Kardashevskiy ha scritto:
>>>>> Hi!
>>>>> 
>>>>> I found the migration broken on pseries platform, specifically, this patch
>>>>> broke it:
>>>>> 
>>>>> f1c72795af573b24a7da5eb52375c9aba8a37972
>>>>> migration: do not sent zero pages in bulk stage
>>>>> 
>>>>> The idea is not to send zero pages to the destination guest which is
>>>>> expected to have 100% empty RAM.
>>>>> 
>>>>> However on pseries plaftorm the guest always has some stuff in the RAM as 
>>>>> a
>>>>> part of initialization (device tree, system firmware and rtas (?)) so it 
>>>>> is
>>>>> not completely empty. As the source guest cannot detect this, it skips 
>>>>> some
>>>>> pages during migration and we get a broken destination guest. Bug.
>>>>> 
>>>>> While the idea is ok in general, I do not see any easy way to fix it as
>>>>> neither QEMUMachine::init nor QEMUMachine::reset callbacks has information
>>>>> about whether we are about to receive a migration or not (-incoming
>>>>> parameter) and we cannot move device-tree and system firmware
>>>>> initialization anywhere else.
>>>>> 
>>>>> ram_bulk_stage is static and cannot be disabled from the platform
>>>>> initialization code.
>>>>> 
>>>>> So what would the community suggest?
>>>> Revert the patch. :)
>>> I'll wait for 24 hours (forgot to cc: the author) and then post a revert
>>> patch :)
>> does this problem only occur on pseries emulation?
> 
> Probably not.  On a PC, it would occur if you had 4K of zeros in the
> source BIOS but not in the destination BIOS.  When you reboot, the BIOS
> image is wrong.
> 
>> not sending zero pages is not only a performance benefit it also makes
>> overcomitted memory usable. the madv_dontneed seems to kick in asynchronously
>> and memory is not available immediately.
> 
> You could also scan the page for nonzero values before writing it.

i had this in mind, but then choosed the other approach.... turned out to be a 
bad idea.

alexey: i will prepare a patch later today, could you then please verify it 
fixes your problem.

paolo: would we still need the madvise or is it enough to not write the zeroes?

Peter

> 
> Paolo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]