qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC 1/6] io: only allow return path for socket t


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [PATCH RFC 1/6] io: only allow return path for socket typed
Date: Fri, 19 May 2017 14:02:00 +0100
User-agent: Mutt/1.8.2 (2017-04-18)

* Daniel P. Berrange (address@hidden) wrote:
> On Fri, May 19, 2017 at 01:51:43PM +0100, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrange (address@hidden) wrote:
> > > On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote:
> > > > We don't really have a return path for the other types yet. Let's check
> > > > this when .get_return_path() is called.
> > > > 
> > > > For this, we introduce a new feature bit, and set it up only for socket
> > > > typed IO channels.
> > > > 
> > > > This will help detect earlier failure for postcopy, e.g., logically
> > > > speaking postcopy cannot work with "exec:". Before this patch, when we
> > > > try to migrate with "migrate -d exec:cat>out", we'll hang the system.
> > > > With this patch, we'll get:
> > > > 
> > > > (qemu) migrate -d exec:cat>out
> > > > Unable to open return-path for postcopy
> > > 
> > > This is wrong - post-copy migration *can* work with exec: - it just 
> > > entirely
> > > depends on what command you are running. Your example ran a command which 
> > > is
> > > unidirectional, but if you ran 'exec:socat ...' you would have a fully
> > > bidirectional channel. Actually the channel is always bi-directional, but
> > > 'cat' simply won't ever send data back to QEMU.
> > 
> > The thing is it didn't used to be able to; prior to your conversion to
> > channel, postcopy would reject being started with exec: because it
> > couldn't open a return path, so it was safe.
> 
> It safe but functionally broken because it is valid to want to use
> exec migration with post-copy.
> 
> > > If QEMU hangs when the other end doesn't send data back, that actually 
> > > seems
> > > like a potentially serious bug in migration code. Even if using the normal
> > > 'tcp' migration protocol, if the target QEMU server hangs and fails to
> > > send data to QEMU on the return path, the source QEMU must never hang.
> > 
> > Hmm, we shouldn't get a 'hang' with a postcopy on a link without a
> > return path; but it does depend on how the exec: behaves on the
> > destination.
> > If the destination discards data written to it, then I think the
> > behaviour would be:
> >    a) Page requests would just get dropped, they'd eventually get
> > fulfilled by the background page transmissions, but that could mean that
> > a page request would wait for minutes for the page.
> >    b) The qemu main thread on the destination can be blocked by that, so
> > the monitor might not respond until the page request is fulfilled.
> >    c) I'm not quite sure what would happen to the source return-path
> > thread
> > 
> > The behaviour seems to have changed between 2.9.0 (f26 package) and my
> > reasonably recent head build.
> 
> That's due to the bug we just fixed where we mistakenly didn't
> allow bi-directional I/O for exec
> 
>   commit 062d81f0e968fe1597474735f3ea038065027372
>   Author: Daniel P. Berrange <address@hidden>
>   Date:   Fri Apr 21 12:12:20 2017 +0100
> 
>     migration: setup bi-directional I/O channel for exec: protocol
>     
>     Historically the migration data channel has only needed to be
>     unidirectional. Thus the 'exec:' protocol was requesting an
>     I/O channel with O_RDONLY on incoming side, and O_WRONLY on
>     the outgoing side.
>     
>     This is fine for classic migration, but if you then try to run
>     TLS over it, this fails because the TLS handshake requires a
>     bi-directional channel.
>     
>     Signed-off-by: Daniel P. Berrange <address@hidden>
>     Reviewed-by: Juan Quintela <address@hidden>
>     Signed-off-by: Juan Quintela <address@hidden>
> 
> 
> > A migration_cancel also doesn't work for 'exec' because it doesn't
> > support shutdown() - it just sticks in 'cancelling'.
> > On a socket that was broken like this the cancel would work because
> > it issues a shutdown() which causes the socket to cleanup.
> 
> I'm curious why migration_cancel requires shutdown() to work ? Why
> isn't it sufficient from the source POV to just close the socket
> entirely straight away.

close() closes the fd so that any other uses of the fd get an
error and you're at risk of the fd getting reallocated by something
else.  So consider if the main thread calls close(), the migration
thread and the return thread both carry on using that fd, which might
have been reallocated to something different.  Worse I think we came to the
consolution that on some OSs a read()/write() blocked in the use of an fd didn't
get kicked out by a close.

shutdown() is safe, in that it stops any other threads accessing the fd
but doesn't allow it's reallocation until the close;  We perform the
close only when we've joined all other threads that were using the fd.
Any of the threads that do new calls on the fd get an error and quickly
fall down their error paths.

Dave

> Regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]