[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [PATCH COLO-Frame v13 26/39] COLO failover: Shutdown relate
From: |
zhanghailiang |
Subject: |
[Qemu-devel] [PATCH COLO-Frame v13 26/39] COLO failover: Shutdown related socket fd when do failover |
Date: |
Tue, 29 Dec 2015 15:09:22 +0800 |
If the net connection between COLO's two sides is broken while colo/colo
incoming
thread is blocked in 'read'/'write' socket fd. It will not detect this error
until
connect timeout. It will be a long time.
Here we shutdown all the related socket file descriptors to wake up the blocking
operation in failover BH. Besides, we should close the corresponding file
descriptors
after failvoer BH shutdown them, or there will be an error.
Signed-off-by: zhanghailiang <address@hidden>
Signed-off-by: Li Zhijian <address@hidden>
Reviewed-by: Dr. David Alan Gilbert <address@hidden>
Cc: Dr. David Alan Gilbert <address@hidden>
---
v13:
- Add Reviewed-by tag
- Use semaphore to notify colo/colo incoming loop that failover work is
finished.
v12:
- Shutdown both QEMUFile's fd though they may use the same fd. (Dave's
suggestion)
v11:
- Only shutdown fd for once
---
include/migration/migration.h | 3 +++
migration/colo.c | 43 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 46 insertions(+)
diff --git a/include/migration/migration.h b/include/migration/migration.h
index 14b9f3d..b34def6 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -112,6 +112,7 @@ struct MigrationIncomingState {
QemuThread colo_incoming_thread;
/* The coroutine we should enter (back) after failover */
Coroutine *migration_incoming_co;
+ QemuSemaphore colo_incoming_sem;
/* See savevm.c */
LoadStateEntry_Head loadvm_handlers;
@@ -175,6 +176,8 @@ struct MigrationState
QSIMPLEQ_HEAD(src_page_requests, MigrationSrcPageRequest)
src_page_requests;
/* The RAMBlock used in the last src_page_request */
RAMBlock *last_req_rb;
+
+ QemuSemaphore colo_sem;
};
void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/migration/colo.c b/migration/colo.c
index 6c17cc4..03920d3 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -59,6 +59,18 @@ static void secondary_vm_do_failover(void)
/* recover runstate to normal migration finish state */
autostart = true;
}
+ /*
+ * Make sure colo incoming thread not block in recv or send,
+ * If mis->from_src_file and mis->to_src_file use the same fd,
+ * The second shutdown() will return -1, we ignore this value,
+ * it is harmless.
+ */
+ if (mis->from_src_file) {
+ qemu_file_shutdown(mis->from_src_file);
+ }
+ if (mis->to_src_file) {
+ qemu_file_shutdown(mis->to_src_file);
+ }
old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
FAILOVER_STATUS_COMPLETED);
@@ -67,6 +79,8 @@ static void secondary_vm_do_failover(void)
"secondary VM", old_state);
return;
}
+ /* Notify COLO incoming thread that failover work is finished */
+ qemu_sem_post(&mis->colo_incoming_sem);
/* For Secondary VM, jump to incoming co */
if (mis->migration_incoming_co) {
qemu_coroutine_enter(mis->migration_incoming_co, NULL);
@@ -81,6 +95,18 @@ static void primary_vm_do_failover(void)
migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
MIGRATION_STATUS_COMPLETED);
+ /*
+ * Make sure colo thread no block in recv or send,
+ * The s->rp_state.from_dst_file and s->to_dst_file may use the
+ * same fd, but we still shutdown the fd for twice, it is harmless.
+ */
+ if (s->to_dst_file) {
+ qemu_file_shutdown(s->to_dst_file);
+ }
+ if (s->rp_state.from_dst_file) {
+ qemu_file_shutdown(s->rp_state.from_dst_file);
+ }
+
old_state = failover_set_state(FAILOVER_STATUS_HANDLING,
FAILOVER_STATUS_COMPLETED);
if (old_state != FAILOVER_STATUS_HANDLING) {
@@ -88,6 +114,8 @@ static void primary_vm_do_failover(void)
old_state);
return;
}
+ /* Notify COLO thread that failover work is finished */
+ qemu_sem_post(&s->colo_sem);
}
void colo_do_failover(MigrationState *s)
@@ -383,6 +411,14 @@ out:
qsb_free(buffer);
buffer = NULL;
+ /* Hope this not to be too long to wait here */
+ qemu_sem_wait(&s->colo_sem);
+ qemu_sem_destroy(&s->colo_sem);
+ /*
+ * Must be called after failover BH is completed,
+ * Or the failover BH may shutdown the wrong fd, that
+ * re-used by other thread after we release here.
+ */
if (s->rp_state.from_dst_file) {
qemu_fclose(s->rp_state.from_dst_file);
}
@@ -391,6 +427,7 @@ out:
void migrate_start_colo_process(MigrationState *s)
{
qemu_mutex_unlock_iothread();
+ qemu_sem_init(&s->colo_sem, 0);
migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE,
MIGRATION_STATUS_COLO);
colo_process_checkpoint(s);
@@ -430,6 +467,8 @@ void *colo_process_incoming_thread(void *opaque)
Error *local_err = NULL;
int ret;
+ qemu_sem_init(&mis->colo_incoming_sem, 0);
+
migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
MIGRATION_STATUS_COLO);
@@ -561,6 +600,10 @@ out:
*/
colo_release_ram_cache();
+ /* Hope this not to be too long to loop here */
+ qemu_sem_wait(&mis->colo_incoming_sem);
+ qemu_sem_destroy(&mis->colo_incoming_sem);
+ /* Must be called after failover BH is completed */
if (mis->to_src_file) {
qemu_fclose(mis->to_src_file);
}
--
1.8.3.1
- [Qemu-devel] [PATCH COLO-Frame v13 33/39] COLO: Split qemu_savevm_state_begin out of checkpoint process, (continued)
- [Qemu-devel] [PATCH COLO-Frame v13 33/39] COLO: Split qemu_savevm_state_begin out of checkpoint process, zhanghailiang, 2015/12/29
- [Qemu-devel] [PATCH COLO-Frame v13 14/39] ram: Split host_from_stream_offset() into two helper functions, zhanghailiang, 2015/12/29
- [Qemu-devel] [PATCH COLO-Frame v13 17/39] COLO: Load VMState into qsb before restore it, zhanghailiang, 2015/12/29
- [Qemu-devel] [PATCH COLO-Frame v13 30/39] savevm: Split load vm state function qemu_loadvm_state, zhanghailiang, 2015/12/29
- [Qemu-devel] [PATCH COLO-Frame v13 39/39] COLO: Add block replication into colo process, zhanghailiang, 2015/12/29
- [Qemu-devel] [PATCH COLO-Frame v13 29/39] COLO: Update the global runstate after going into colo state, zhanghailiang, 2015/12/29
- [Qemu-devel] [PATCH COLO-Frame v13 03/39] COLO: migrate colo related info to secondary node, zhanghailiang, 2015/12/29
- [Qemu-devel] [PATCH COLO-Frame v13 35/39] filter-buffer: Accept zero interval, zhanghailiang, 2015/12/29
- [Qemu-devel] [PATCH COLO-Frame v13 37/39] filter-buffer: Introduce a helper function to release packets, zhanghailiang, 2015/12/29
- [Qemu-devel] [PATCH COLO-Frame v13 21/39] COLO failover: Introduce a new command to trigger a failover, zhanghailiang, 2015/12/29
- [Qemu-devel] [PATCH COLO-Frame v13 26/39] COLO failover: Shutdown related socket fd when do failover,
zhanghailiang <=
- [Qemu-devel] [PATCH COLO-Frame v13 25/39] qmp event: Add COLO_EXIT event to notify users while exited from COLO, zhanghailiang, 2015/12/29
- [Qemu-devel] [PATCH COLO-Frame v13 34/39] net/filter-buffer: Add default filter-buffer for each netdev, zhanghailiang, 2015/12/29
- [Qemu-devel] [PATCH COLO-Frame v13 31/39] savevm: Introduce two helper functions for save/find loadvm_handlers entry, zhanghailiang, 2015/12/29
- [Qemu-devel] [PATCH COLO-Frame v13 28/39] COLO: Process shutdown command for VM in COLO state, zhanghailiang, 2015/12/29
- [Qemu-devel] [PATCH COLO-Frame v13 36/39] filter-buffer: Introduce a helper function to enable/disable default filter, zhanghailiang, 2015/12/29
- [Qemu-devel] [PATCH COLO-Frame v13 38/39] colo: Use default buffer-filter to buffer and release packets, zhanghailiang, 2015/12/29
- [Qemu-devel] [PATCH COLO-Frame v13 06/39] migration: Integrate COLO checkpoint process into migration, zhanghailiang, 2015/12/29
- Re: [Qemu-devel] [PATCH COLO-Frame v13 00/39] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT), Hailiang Zhang, 2015/12/29