[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Stable-8.1.2 26/45] migration: Fix race that dest preempt thread close
From: |
Michael Tokarev |
Subject: |
[Stable-8.1.2 26/45] migration: Fix race that dest preempt thread close too early |
Date: |
Wed, 4 Oct 2023 11:01:47 +0300 |
From: Peter Xu <peterx@redhat.com>
We hit intermit CI issue on failing at migration-test over the unit test
preempt/plain:
qemu-system-x86_64: Unable to read from socket: Connection reset by peer
Memory content inconsistency at 5b43000 first_byte = bd last_byte = bc current
= 4f hit_edge = 1
**
ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed:
(bad == 0)
(test program exited with status code -6)
Fabiano debugged into it and found that the preempt thread can quit even
without receiving all the pages, which can cause guest not receiving all
the pages and corrupt the guest memory.
To make sure preempt thread finished receiving all the pages, we can rely
on the page_requested_count being zero because preempt channel will only
receive requested page faults. Note, not all the faulted pages are required
to be sent via the preempt channel/thread; imagine the case when a
requested page is just queued into the background main channel for
migration, the src qemu will just still send it via the background channel.
Here instead of spinning over reading the count, we add a condvar so the
main thread can wait on it if that unusual case happened, without burning
the cpu for no good reason, even if the duration is short; so even if we
spin in this rare case is probably fine. It's just better to not do so.
The condvar is only used when that special case is triggered. Some memory
ordering trick is needed to guarantee it from happening (against the
preempt thread status field), so the main thread will always get a kick
when that triggers correctly.
Closes: https://gitlab.com/qemu-project/qemu/-/issues/1886
Debugged-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-2-farosas@suse.de>
(cherry picked from commit cf02f29e1e3843784630d04783e372fa541a77e5)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
diff --git a/migration/migration.c b/migration/migration.c
index 5528acb65e..6a7122694a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -153,6 +153,7 @@ void migration_object_init(void)
qemu_sem_init(¤t_incoming->postcopy_qemufile_dst_done, 0);
qemu_mutex_init(¤t_incoming->page_request_mutex);
+ qemu_cond_init(¤t_incoming->page_request_cond);
current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
migration_object_check(current_migration, &error_fatal);
@@ -367,7 +368,7 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis,
* things like g_tree_lookup() will return TRUE (1) when found.
*/
g_tree_insert(mis->page_requested, aligned, (gpointer)1);
- mis->page_requested_count++;
+ qatomic_inc(&mis->page_requested_count);
trace_postcopy_page_req_add(aligned, mis->page_requested_count);
}
}
diff --git a/migration/migration.h b/migration/migration.h
index 6eea18db36..9a30216895 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -196,7 +196,10 @@ struct MigrationIncomingState {
/* A tree of pages that we requested to the source VM */
GTree *page_requested;
- /* For debugging purpose only, but would be nice to keep */
+ /*
+ * For postcopy only, count the number of requested page faults that
+ * still haven't been resolved.
+ */
int page_requested_count;
/*
* The mutex helps to maintain the requested pages that we sent to the
@@ -210,6 +213,14 @@ struct MigrationIncomingState {
* contains valid information.
*/
QemuMutex page_request_mutex;
+ /*
+ * If postcopy preempt is enabled, there is a chance that the main
+ * thread finished loading its data before the preempt channel has
+ * finished loading the urgent pages. If that happens, the two threads
+ * will use this condvar to synchronize, so the main thread will always
+ * wait until all pages received.
+ */
+ QemuCond page_request_cond;
/*
* Number of devices that have yet to approve switchover. When this reaches
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 29aea9456d..5408e028c6 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -599,6 +599,30 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState
*mis)
if (mis->preempt_thread_status == PREEMPT_THREAD_CREATED) {
/* Notify the fast load thread to quit */
mis->preempt_thread_status = PREEMPT_THREAD_QUIT;
+ /*
+ * Update preempt_thread_status before reading count. Note: mutex
+ * lock only provide ACQUIRE semantic, and it doesn't stops this
+ * write to be reordered after reading the count.
+ */
+ smp_mb();
+ /*
+ * It's possible that the preempt thread is still handling the last
+ * pages to arrive which were requested by guest page faults.
+ * Making sure nothing is left behind by waiting on the condvar if
+ * that unlikely case happened.
+ */
+ WITH_QEMU_LOCK_GUARD(&mis->page_request_mutex) {
+ if (qatomic_read(&mis->page_requested_count)) {
+ /*
+ * It is guaranteed to receive a signal later, because the
+ * count>0 now, so it's destined to be decreased to zero
+ * very soon by the preempt thread.
+ */
+ qemu_cond_wait(&mis->page_request_cond,
+ &mis->page_request_mutex);
+ }
+ }
+ /* Notify the fast load thread to quit */
if (mis->postcopy_qemufile_dst) {
qemu_file_shutdown(mis->postcopy_qemufile_dst);
}
@@ -1277,8 +1301,20 @@ static int qemu_ufd_copy_ioctl(MigrationIncomingState
*mis, void *host_addr,
*/
if (g_tree_lookup(mis->page_requested, host_addr)) {
g_tree_remove(mis->page_requested, host_addr);
- mis->page_requested_count--;
+ int left_pages = qatomic_dec_fetch(&mis->page_requested_count);
+
trace_postcopy_page_req_del(host_addr, mis->page_requested_count);
+ /* Order the update of count and read of preempt status */
+ smp_mb();
+ if (mis->preempt_thread_status == PREEMPT_THREAD_QUIT &&
+ left_pages == 0) {
+ /*
+ * This probably means the main thread is waiting for us.
+ * Notify that we've finished receiving the last requested
+ * page.
+ */
+ qemu_cond_signal(&mis->page_request_cond);
+ }
}
qemu_mutex_unlock(&mis->page_request_mutex);
mark_postcopy_blocktime_end((uintptr_t)host_addr);
--
2.39.2
- [Stable-8.1.2 05/45] target/ppc: Sign-extend large decrementer to 64-bits, (continued)
- [Stable-8.1.2 05/45] target/ppc: Sign-extend large decrementer to 64-bits, Michael Tokarev, 2023/10/04
- [Stable-8.1.2 06/45] hw/ppc: Always store the decrementer value, Michael Tokarev, 2023/10/04
- [Stable-8.1.2 24/45] ui/vnc: fix debug output for invalid audio message, Michael Tokarev, 2023/10/04
- [Stable-8.1.2 27/45] migration: Fix possible race when setting rp_state.error, Michael Tokarev, 2023/10/04
- [Stable-8.1.2 19/45] hw/arm/boot: Set SCR_EL3.FGTEn when booting kernel, Michael Tokarev, 2023/10/04
- [Stable-8.1.2 20/45] target/arm: Don't skip MTE checks for LDRT/STRT at EL0, Michael Tokarev, 2023/10/04
- [Stable-8.1.2 21/45] meson.build: Make keyutils independent from keyring, Michael Tokarev, 2023/10/04
- [Stable-8.1.2 23/45] hw/scsi/scsi-disk: Disallow block sizes smaller than 512 [CVE-2023-42467], Michael Tokarev, 2023/10/04
- [Stable-8.1.2 18/45] include/exec: Widen tlb_hit/tlb_hit_page(), Michael Tokarev, 2023/10/04
- [Stable-8.1.2 25/45] ui/vnc: fix handling of VNC_FEATURE_XVP, Michael Tokarev, 2023/10/04
- [Stable-8.1.2 26/45] migration: Fix race that dest preempt thread close too early,
Michael Tokarev <=
- [Stable-8.1.2 17/45] tests/file-io-error: New test, Michael Tokarev, 2023/10/04
- [Stable-8.1.2 22/45] accel/tcg: mttcg remove false-negative halted assertion, Michael Tokarev, 2023/10/04
- [Stable-8.1.2 29/45] migration: Fix possible race when shutting down to_dst_file, Michael Tokarev, 2023/10/04
- [Stable-8.1.2 30/45] migration: Remove redundant cleanup of postcopy_qemufile_src, Michael Tokarev, 2023/10/04
- [Stable-8.1.2 28/45] migration: Fix possible races when shutting down the return path, Michael Tokarev, 2023/10/04
- [Stable-8.1.2 31/45] migration: Consolidate return path closing code, Michael Tokarev, 2023/10/04
- [Stable-8.1.2 33/45] migration: Move return path cleanup to main migration thread, Michael Tokarev, 2023/10/04
- [Stable-8.1.2 32/45] migration: Replace the return path retry logic, Michael Tokarev, 2023/10/04
- [Stable-8.1.2 34/45] softmmu: Use async_run_on_cpu in tcg_commit, Michael Tokarev, 2023/10/04
- [Stable-8.1.2 35/45] accel/tcg: Avoid load of icount_decr if unused, Michael Tokarev, 2023/10/04