|
From: | Lei Li |
Subject: | Re: [Qemu-devel] [PATCH 0/17 v3] Localhost migration with side channel for ram |
Date: | Mon, 25 Nov 2013 15:29:36 +0800 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0 |
On 11/22/2013 07:36 PM, Paolo Bonzini wrote:
Il 22/11/2013 12:29, Lei Li ha scritto:During the page flipping migration, ram page of source guest would be flipped to the destination, that's why the source guest can not be resumed. AFAICT, the page flipping migration may fail at the connection stage (including the exchange of pipe fd) and migration register stage (say any blocker like unsupported migration device),Unfortunately, some migration problems (e.g. misconfiguration of the destination QEMU) cannot be detected until the device data is migrated. This happens after RAM migration, so there is indeed a reliability problem.
Hi Paolo, 'Some migration problems cannot be detected until the device data is migrated', do you mean that the outgoing migration has no idea the failure of incoming side caused by the misconfiguration of the destination QEMU? In this case, if the migration would fail just because the misconfiguration of device state on destination, in the meantime the outgoing migration has no aware of this failure, I think it should add such handling (like synchronize of the device state list in incoming side?) to the current migration protocol as it is kind of missing... It can not just rely on the resume of source guest for such failure... or maybe it should be handled in management app to force the configuration right?
Postcopy would fix this (assuming the postcopy phase is reliable) by migrating device data before any page flipping occurs.
Are you suggesting that page flipping should be coupled with the postcopy migration for live upgrade of QEMU as your comments in the previous version?
Paolobut it could be resumed for such situation since the memory has not been flipped to another content. Once the connection is successfully setup, it would proceed the transmission of ram page which hardly fails. And for the failure handling in Libvirt, ZhengSheng has proposed that restarts the old QEMU instead of resume. I know 'hardly' is not an good answer to your concern, but it is the cost of the limited memory IMO. So if downtime is the key to the user, or if it's *zero toleration of the restarting of QEMU, page flipping migration might not be a good choice. From the perspective of management app like Libvirt, as the 'live upgrade' of QEMU will be done through localhost migration, and there are other migration solutions which have lower downtime, like the real live migration and the postcopy migration that Paolo mentioned in the previous version [3]. Why not have more than one choice for it?
-- Lei
[Prev in Thread] | Current Thread | [Next in Thread] |