Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcop

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcop

From:	Peter Xu
Subject:	Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy
Date:	Fri, 13 Oct 2017 13:08:09 +0800
User-agent:	Mutt/1.5.24 (2015-08-30)

On Thu, Oct 12, 2017 at 01:19:52PM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (address@hidden) wrote:
> > On Tue, Oct 10, 2017 at 01:30:18PM +0100, Dr. David Alan Gilbert wrote:
> > > * Peter Xu (address@hidden) wrote:
> > > > On Mon, Oct 09, 2017 at 07:58:13PM +0100, Dr. David Alan Gilbert wrote:
> > 
> > [...]
> > 
> > > > > We have to be careful about this; a network can fail in a way it
> > > > > gets stuck rather than fails - this can get stuck until a full TCP
> > > > > disconnection; and that takes about 30mins (from memory).
> > > > > The nice thing about using 'shutdown' is that you can kill the 
> > > > > existing
> > > > > connection if it's hung. (Which then makes an interesting question;
> > > > > the rules in your migrate-incoming command become different if you
> > > > > want to declare it's failed!).  Having said that, you're right that at
> > > > > this point stuff has already failed - so do we need the shutdown?
> > > > > (You might want to do the shutdown as part of the recovery earlier
> > > > > or as a separate command to force the failure)
> > > > 
> > > > I assume if I call shutdown before the lock then we'll be good then.
> > > 
> > > The question is what happens if you only allow recovery if we're already
> > > in postcopy-paused state; in the case of a hung socket, since no IO has
> > > actually failed yet, you will still be in postcopy-active.
> > 
> > Hmm, but isn't that a problem of kernel rather than QEMU?  Since
> > sockets are after all managed by kernel.
> 
> Kind of, but it comes down to what the right behaviour of a TCP socket
> is, and the kernel is probably doing the right thing.
> 
> > I don't really know what is the best thing to do to detect whether a
> > socket is stuck.  Assume we can observed that (say, we see migration
> > transferred bytes keep static for 30 seconds), IIRC you mentioned
> > about iptable tricks to break an existing e.g. TCP connection, then we
> > can trigger the -EIO path.
> 
> From the qemu level I'd prefer to make it a command;  if we start
> adding heuristics and timeouts etc then it's very difficult to actually
> get them right.
> 
> > Or do you think we should provide a way to manually trigger the paused
> > state?  Then it goes back to something we discussed with Dan in the
> > earlier post - I'd appreciate if we can postpone the manual trigger
> > support a bit (to make this series small, which is already not...).
> 
> I think that manual trigger is probably necessary; it would just call a
> shutdown() on the sockets and let the things fail into the paused state.
> It'd be pretty simple.  It would be another OOB command; the tricky
> part is just making sure it's thread safe against hte migration
> finishing when you issue it.
> 
> I think it can wait until after this series if you want, but it would
> be good if we can figure it out.

OK.  Let me try it in my next post.  I hope it won't grow into
something bigger (which does happens sometimes... :).

-- 
Peter Xu

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy, Dr. David Alan Gilbert, 2017/10/09
- Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy, Peter Xu, 2017/10/10
  - Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy, Peter Xu, 2017/10/10
  - Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy, Dr. David Alan Gilbert, 2017/10/10
    - Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy, Peter Xu, 2017/10/10
    - Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy, Dr. David Alan Gilbert, 2017/10/12
    - Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy, Peter Xu <=

Prev by Date: Re: [Qemu-devel] [PATCH v1 3/5] xlnx-zcu102: Specify the valid CPUs
Next by Date: [Qemu-devel] [PATCH v2] oslib-posix: Fix compiler warning and some data types
Previous by thread: Re: [Qemu-devel] [RFC v2 10/33] migration: allow dst vm pause on postcopy
Next by thread: [Qemu-devel] [Bug 1719282] Re: Unable to boot after drive-mirror
Index(es):
- Date
- Thread