qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] migration: invalidate cache before source start


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-devel] [PATCH] migration: invalidate cache before source start
Date: Tue, 26 Jun 2018 11:44:55 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0

25.06.2018 20:50, Dr. David Alan Gilbert wrote:
* Dr. David Alan Gilbert (address@hidden) wrote:
* Vladimir Sementsov-Ogievskiy (address@hidden) wrote:
15.06.2018 15:06, Dr. David Alan Gilbert wrote:
* Vladimir Sementsov-Ogievskiy (address@hidden) wrote:
Invalidate cache before source start in case of failed migration.

Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
Why doesn't the code at the bottom of migration_completion,
fail_invalidate:   and the code in migrate_fd_cancel   handle this?

What case did you see it in that those didn't handle?
(Also I guess it probably should set s->block_inactive = false)
on source I see:

address@hidden:migrate_set_state new state 7
address@hidden:migration_thread_file_err
address@hidden:migration_thread_after_loop

so, we are leaving loop on
         if (qemu_file_get_error(s->to_dst_file)) {
             migrate_set_state(&s->state, current_active_state,
MIGRATION_STATUS_FAILED);
trace_migration_thread_file_err();
break;
         }

and skip migration_completion()


John is right, this ls an unrelated log, here we fail before inactivation and there are no problems.

Actual problem is when we fail in postcopy_start, at the end. And source log looks like:

address@hidden:migrate_set_state new state 1
address@hidden:migration_fd_outgoing fd=101
address@hidden:migration_set_outgoing_channel ioc=0x56363454d630 ioctype=qio-channel-socket hostname=(null)
address@hidden:migration_bitmap_sync_start
address@hidden:migration_bitmap_sync_end dirty_pages 932
address@hidden:migrate_set_state new state 4
address@hidden:migration_thread_setup_complete
address@hidden:migrate_pending pending size 1107976192 max 0 (pre = 0 compat=1107976192 post=0)
address@hidden:migrate_set_state new state 5
Tap fd 33 disable, ret 0
address@hidden:migration_bitmap_sync_start
address@hidden:migration_bitmap_sync_end dirty_pages 1091
address@hidden:migrate_global_state_pre_save saved state: running
2018-06-26T08:29:56.439134Z qemu-kvm: postcopy_start: Migration stream errored -5
address@hidden:migrate_set_state new state 7
address@hidden:migration_thread_after_loop
Tap fd 33 enable, ret 0
address@hidden:migrate_fd_cleanup
qemu-kvm: block/io.c:1655: bdrv_co_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed.
2018-06-26 08:29:56.605+0000: shutting down, reason=crashed


Yeh, OK; I'd seen soemthing else a few days ago, where a cancellation
test that had previously ended with a 'cancelled' state has now ended up
in 'failed' (which is the state 7 you have above).
I suspect there's something else going on as well; I think what is
supposed to happen in the case of 'cancel' is that it spins in 'cancelling' for
a while in migrate_fd_cancel and then at the bottom of migrate_fd_cancel
it does the recovery, but because it's going to failed instead, then
it's jumping over that recovery.
Going back and actually looking at the patch again;
can I ask for 1 small change;
    Can you set s->block_inactive = false   in the case where you
don't get the local_err (Like we do at the bottom of migrate_fd_cancel)


Does that make sense?

Ok, I'll resend.

Hm, looks like I'm fixing an outdated version (based on v2.9.0) And my reproduce isn't appropriate for upstream.
But looks like current code have a possibility of the same fail:

postcopy_start()
    ....
    ret = qemu_file_get_error(ms->to_dst_file);
    if (ret) {
        error_report("postcopy_start: Migration stream errored");

leads to "return MIG_ITERATE_SKIP;" in migration_iteration_run

then the loop should finish, as state should be MIGRATION_STATUS_FAILED, so we will not call migration_completion.

Hm, I have questions now:

1. should we check s->block_inactive, and if it is false, don't invalidate? it is done in migrate_fd_cancel(), but not don in migration_completion(). 2. should we call qemu_mutex_lock_iothread() like in migration_completion()? Why is it needed in migration_completion(), when vm is not running?


Thanks,

Dave

Dave

Dave

---

   migration/migration.c | 9 ++++++++-
   1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 1e99ec9b7e..8f39e0dc02 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2806,7 +2806,14 @@ static void migration_iteration_finish(MigrationState *s)
       case MIGRATION_STATUS_FAILED:
       case MIGRATION_STATUS_CANCELLED:
           if (s->vm_was_running) {
-            vm_start();
+            Error *local_err = NULL;
+            bdrv_invalidate_cache_all(&local_err);
+            if (local_err) {
+                error_reportf_err(local_err, "Can't invalidate disks before "
+                                  "source vm start");
+            } else {
+                vm_start();
+            }
           } else {
               if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
                   runstate_set(RUN_STATE_POSTMIGRATE);
--
2.11.1

--
Dr. David Alan Gilbert / address@hidden / Manchester, UK

--
Best regards,
Vladimir

--
Dr. David Alan Gilbert / address@hidden / Manchester, UK
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK


--
Best regards,
Vladimir




reply via email to

[Prev in Thread] Current Thread [Next in Thread]