|
From: | Alex Bligh |
Subject: | Re: [Qemu-devel] When does live migration give up? |
Date: | Wed, 04 Sep 2013 19:05:50 +0100 |
Paolo,--On 4 September 2013 19:07:53 +0200 Paolo Bonzini <address@hidden> wrote:
Il 04/09/2013 17:24, Alex Bligh ha scritto:We have seen a situation when migrating about 50 VMs at once where some of them fail. I think this is because they are dirtying pages faster than they can be transmitted.No, migration never "gives up". It may never converge, but it keeps trying until cancelled. Could it be that you are choosing migration server ports from a small range, and some of them are failing because two migrations pick the same random port for the destination (which is where the server socket lies)?
Should not be that. We create FDs (which are sockets) and pass them in at both ends. Approx 10% of migrations die after many minutes on the customer's platform. This does not appear to happen if migrations are not carried out 50 at a time. We appear to be getting something other than 'ms' returned through the monitoring system. Unhelpfully what that is is not logged. Is there anything (apart from the socket closing prematurely) which can cause a failed migration after many minutes? We've seen problems where the destination is not set up the same as the source (e.g. different numbers of NICs) but IIRC that fails much earlier. To make things easier (cough), this is qemu 1.0 (as shipped with Ubuntu Precise). -- Alex Bligh
[Prev in Thread] | Current Thread | [Next in Thread] |