[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [PATCH] Ensure migrate_cancel does not block doing I/O
From: |
Daniel P. Berrange |
Subject: |
[Qemu-devel] [PATCH] Ensure migrate_cancel does not block doing I/O |
Date: |
Fri, 26 Aug 2011 11:59:28 +0100 |
From: "Daniel P. Berrange" <address@hidden>
There are two common cases where migrate_cancel is intended to be
used
1. When migration is not converging due to an overactive
guest and insufficient network bandwidth
2. When migration is stuck due a network outage, waiting
for the TCP transmit timeout to occurr & return an I/O
error for send()
In the second case, if you attempt to use 'migrate_cancel' it
will also get stuck. This can be seen by attempting to migrate
to a QEMU which has been SIGSTOP'd
$ ./x86_64-softmmu/qemu-system-x86_64 -cdrom ~/boot.iso -m 600 \
-monitor stdio -vnc :2 -incoming tcp:localhost:9000
QEMU 0.14.1 monitor - type 'help' for more information
(qemu)
<Ctrl-Z>
[1]+ Stopped
And in another shell
$ ./x86_64-softmmu/qemu-system-x86_64 -cdrom ~/boot.iso -m 600 \
-monitor stdio -vnc :1
QEMU 0.14.1 monitor - type 'help' for more information
(qemu) migrate -d tcp:localhost:9000
(qemu) info migrate
Migration status: active
transferred ram: 416 kbytes
remaining ram: 621624 kbytes
total ram: 623040 kbytes
(qemu) migrate_cancel
This last command will never return, until the first QEMU is
resumed. Looking at the stack trace in GDB you see
#0 0x0000003a8320e4c2 in __libc_send (fd=10, buf=0x1bc7c70, n=19777, flags=0)
at ../sysdeps/unix/sysv/linux/x86_64/send.c:28
#1 0x000000000048fb1e in socket_write (s=<optimized out>, buf=<optimized
out>, size=<optimized out>)
at migration-tcp.c:39
#2 0x000000000048eba4 in migrate_fd_put_buffer (opaque=0x1b76ad0,
data=0x1bc7c70, size=19777)
at migration.c:324
#3 0x000000000048e442 in buffered_flush (s=0x1b76b90) at buffered_file.c:87
#4 0x000000000048e4cf in buffered_close (opaque=0x1b76b90) at
buffered_file.c:177
#5 0x0000000000496d57 in qemu_fclose (f=0x1bbfc10) at savevm.c:479
#6 0x000000000048f4ca in migrate_fd_cleanup (s=0x1b76ad0) at migration.c:291
#7 0x000000000048f035 in do_migrate_cancel (mon=<optimized out>,
qdict=<optimized out>,
ret_data=<optimized out>) at migration.c:136[snip]
[snip]
The migration_fd_cleanup method is where the problem really starts.
Specifically it does
if (s->file) {
DPRINTF("closing file\n");
if (qemu_fclose(s->file) != 0) {
ret = -1;
}
s->file = NULL;
}
if (s->fd != -1)
close(s->fd);
And gets stuck in the qemu_fclose() bit because that method (rightly) tries
to flush all outstanding buffers before closing. Unfortunately while this is
desirable when migration ends successfully, it is undesirable when we are
failing/cancelling migration.
It is hard to tell qemu_fclose() that it shouldn't flush buffers directly,
so the alternative is to ensure that this method fails quickly when it
attempts I/O. This is easily achieved, simply by closing 's->fd' before
calling qemu_fclose.
Signed-off-by: Daniel P. Berrange <address@hidden>
---
migration.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/migration.c b/migration.c
index f5959b4..a432c3b 100644
--- a/migration.c
+++ b/migration.c
@@ -286,6 +286,13 @@ int migrate_fd_cleanup(FdMigrationState *s)
qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
+ if ((s->state == MIG_STATE_ERROR ||
+ s->state == MIG_STATE_CANCELLED) &&
+ s->fd != -1) {
+ close(s->fd);
+ s->fd = -1;
+ }
+
if (s->file) {
DPRINTF("closing file\n");
if (qemu_fclose(s->file) != 0) {
--
1.7.6
- [Qemu-devel] [PATCH] Ensure migrate_cancel does not block doing I/O,
Daniel P. Berrange <=