qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH v4 4/5] separate thread for VM migration


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [RFC PATCH v4 4/5] separate thread for VM migration
Date: Wed, 17 Aug 2011 00:13:27 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:5.0) Gecko/20110707 Thunderbird/5.0

On 08/16/2011 08:56 PM, Umesh Deshpande wrote:
+            qemu_mutex_lock_ramlist();

Taken locks: iothread, ramlist

+            qemu_mutex_unlock_iothread();

Taken locks: ramlist

+            s->wait_for_unfreeze(s);
+            qemu_mutex_lock_iothread();

Taken locks: ramlist, iothread

You'd have a deadlock here if hypothetically you had two migrations at the same time.

+            qemu_mutex_unlock_ramlist();

But in general, why this locking? The buffered file need not know anything about the ram list and its mutex. Only ram_save_live needs to hold the ramlist lock---starting just before sort_ram_list and ending just after the end of stage 3. That should be part of patch 2.

Actually buffered_file.c should ideally not even take the iothread lock! The code there is only copying data from a private buffer to a file descriptor; neither is shared. It's migrate_fd_put_buffer that should take care of locking. This is an example of why keeping the separation of QEMUBufferedFile is a good idea at least for now.

I am still kind of unconvinced about calling qemu_fclose from the migration thread. You still have one instance of cancellation in the iothread when migrate_fd_release is called. Ideally, as soon as migration finishes or has an error you could trigger a bottom half that closes the file (which in turn joins the thread). Migration state notifiers should also be run only from the iothread. Failure to do so (or in general lack of a policy of what runs where) can lead to very difficult bugs. Not so much hard to debug in this case (we have a global lock, so things cannot go _that_ bad), but hard to fix without redoing everything.

However, this patch is a good start (with locking fixed). It should takes several incremental steps before getting there, including incredible simplification if you take into account that migration can block and wait_for_unfreeze can disappear. In the end it probably should be committed as a single patch, but I'm liking the patches more and more.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]