qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC 1/6] io: only allow return path for socket t


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [PATCH RFC 1/6] io: only allow return path for socket typed
Date: Fri, 19 May 2017 13:51:43 +0100
User-agent: Mutt/1.8.2 (2017-04-18)

* Daniel P. Berrange (address@hidden) wrote:
> On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote:
> > We don't really have a return path for the other types yet. Let's check
> > this when .get_return_path() is called.
> > 
> > For this, we introduce a new feature bit, and set it up only for socket
> > typed IO channels.
> > 
> > This will help detect earlier failure for postcopy, e.g., logically
> > speaking postcopy cannot work with "exec:". Before this patch, when we
> > try to migrate with "migrate -d exec:cat>out", we'll hang the system.
> > With this patch, we'll get:
> > 
> > (qemu) migrate -d exec:cat>out
> > Unable to open return-path for postcopy
> 
> This is wrong - post-copy migration *can* work with exec: - it just entirely
> depends on what command you are running. Your example ran a command which is
> unidirectional, but if you ran 'exec:socat ...' you would have a fully
> bidirectional channel. Actually the channel is always bi-directional, but
> 'cat' simply won't ever send data back to QEMU.

The thing is it didn't used to be able to; prior to your conversion to
channel, postcopy would reject being started with exec: because it
couldn't open a return path, so it was safe.

> If QEMU hangs when the other end doesn't send data back, that actually seems
> like a potentially serious bug in migration code. Even if using the normal
> 'tcp' migration protocol, if the target QEMU server hangs and fails to
> send data to QEMU on the return path, the source QEMU must never hang.

Hmm, we shouldn't get a 'hang' with a postcopy on a link without a
return path; but it does depend on how the exec: behaves on the
destination.
If the destination discards data written to it, then I think the
behaviour would be:
   a) Page requests would just get dropped, they'd eventually get
fulfilled by the background page transmissions, but that could mean that
a page request would wait for minutes for the page.
   b) The qemu main thread on the destination can be blocked by that, so
the monitor might not respond until the page request is fulfilled.
   c) I'm not quite sure what would happen to the source return-path
thread

The behaviour seems to have changed between 2.9.0 (f26 package) and my
reasonably recent head build.

2.9.0 gives me:
(qemu) migrate_set_speed 1B
(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate -d "exec:cat > out"
RP: Received invalid message 0x0000 length 0x0000
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off 
compress: off events: off postcopy-ram: on x-colo: off release-ram: off 
Migration status: failed
total time: 0 milliseconds

So that's the return path thread trying to read from the exec: not
getting anything and failing.

On head-ish it doesn't fail, the source qemu doesn't hang, however
the migration never completes - possibly because it's waiting for
the MIG_RP_MSG_SHUT from the destination.
A migration_cancel also doesn't work for 'exec' because it doesn't
support shutdown() - it just sticks in 'cancelling'.
On a socket that was broken like this the cancel would work because
it issues a shutdown() which causes the socket to cleanup.

Personally I'd rather fix this by still not supporting exec:,
making shutdown() work on exec (by kill'ing the child process)
means at least cancel would work, but it still wouldn't be pretty
for a postcopy, and still doesn't help Peter solve this problem
which is a nasty problem QEMU has had for ages.

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]