qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 11/17] COLO ctl: implement colo checkpoint p


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [RFC PATCH 11/17] COLO ctl: implement colo checkpoint protocol
Date: Fri, 12 Sep 2014 12:57:46 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

* Hongyang Yang (address@hidden) wrote:
> 
> 
> ??? 09/12/2014 07:17 PM, Dr. David Alan Gilbert ??????:
> >* Hongyang Yang (address@hidden) wrote:
> >>
> >>
> >>??? 08/01/2014 11:03 PM, Dr. David Alan Gilbert ??????:
> >>>* Yang Hongyang (address@hidden) wrote:
> >
> ><snip>
> >
> >>>>+static int do_colo_transaction(MigrationState *s, QEMUFile *control,
> >>>>+                               QEMUFile *trans)
> >>>>+{
> >>>>+    int ret;
> >>>>+
> >>>>+    ret = colo_ctl_put(s->file, COLO_CHECKPOINT_NEW);
> >>>>+    if (ret) {
> >>>>+        goto out;
> >>>>+    }
> >>>>+
> >>>>+    ret = colo_ctl_get(control, COLO_CHECKPOINT_SUSPENDED);
> >>>
> >>>What happens at this point if the slave just doesn't respond?
> >>>(i.e. the socket doesn't drop - you just don't get the byte).
> >>
> >>If the socket return bytes that were not expected, exit. If
> >>socket return error, do some cleanup and quit COLO process.
> >>refer to: colo_ctl_get() and colo_ctl_get_value()
> >
> >But what happens if the slave just doesn't respond at all; e.g.
> >if the slave host loses power, it'll take a while (many seconds)
> >before the socket will timeout.
> 
> It will wait until the call returns timeout error, and then do some
> cleanup and quit COLO process.

If it was to wait here for ~30seconds for the timeout what would happen
to the primary? Would it be stopped from sending any network traffic
for those 30 seconds - I think that's too long to fail over.

> There may be better way to handle this?

In postcopy I always take reads coming back from the destination
in a separate thread, because that thread can't block the main thread
going out (I originally did that using async reads but the thread
is nicer).  You could also use something like a poll() with a shorter
timeout to however long you are happy for COLO to go before it fails.

Dave
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]