[Qemu-devel] race condition when exec'ing "qemu -incoming" followed by m

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] race condition when exec'ing "qemu -incoming" followed by m

From:	Laine Stump
Subject:	[Qemu-devel] race condition when exec'ing "qemu -incoming" followed by monitor "cont"
Date:	Fri, 09 Apr 2010 12:03:54 -0400
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100330 Fedora/3.0.4-1.fc12 Thunderbird/3.0.4

(Please forgive (and correct!) any inaccuracies in my description ofqemu's workings - I've only recently started looking at it directly,rather than through the lens of libvirt)


libvirt implements a "domain restore" operation by:

0) start with a previously saved domain image in a file

1) open the domain image, and connect it to a pipe

2) fork, connect the pipe to stdin, and exec qemu with "-incoming exec:cat"

3) execute "cont" in that qemu's monitor.

(for those familiar with the code, you can look at thesrc/qemu/eqmu_driver.c:qemudDomainRestore() in the libvirt source).

Although this works successfully for most people, I'm consistentlyseeing a problem on my particular hardware (Intel Core 2 Duo 2.2Ghz)that causes this domain restore to fail. It seems that the "cont"command takes effect before the restore is completed (possibly/probablybefore it even starts?) resulting in a failed restore - the domain isleft in some random state, sometimes rebooting spontaneously, sometimesjust hung.

If I insert a usleep(250 * 1000) between starting up qemu with"-incoming exec:cat" and issuing "cont" to start the CPUs, the restoreis successful 100% of the time.

I've been told that once the incoming migration starts, the monitor willbe non-responsive until it is complete. This should mean that as long asthe "cont" isn't issued until after the migration starts, it will beblocked until the migration is complete, thus protecting us from therace; for this reason (along with the fact that a 250msec sleep isenough to cure the problem) I'm thinking it's likely the "cont" happensbefore the migration starts.

There is, of course, an "info migrate" command in the monitor that couldbe used to assure the migration had completed before issuing "cont", butthat command only works for outgoing migrations, not incoming(presumably if it was available, checking the info prior to themigration starting would return "not started" (or something similar),and once it had started, the entire monitor interface would block untilthe migrate was completed).

Can someone provide any insight on why it is possible to start the CPUsin the domain before the incoming migration is complete, and what we cando (other than blindly sleeping) to prevent this?

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] race condition when exec'ing "qemu -incoming" followed by monitor "cont", Laine Stump <=
- [Qemu-devel] Re: race condition when exec'ing "qemu -incoming" followed by monitor "cont", Paolo Bonzini, 2010/04/09
- Re: [Qemu-devel] race condition when exec'ing "qemu -incoming" followed by monitor "cont", Laine Stump, 2010/04/09
- Re: [Qemu-devel] race condition when exec'ing "qemu -incoming" followed by monitor "cont", Daniel P. Berrange, 2010/04/09

Prev by Date: Re: [Qemu-devel] [RFC PATCH 0/7] QEMU patches to generate FDT from qdevs
Next by Date: Re: [Qemu-devel] Weird thing happen when the VM is stop! (0.12.3)
Previous by thread: [Qemu-devel] [PATCH 0/2] block: Cleanups
Next by thread: [Qemu-devel] Re: race condition when exec'ing "qemu -incoming" followed by monitor "cont"
Index(es):
- Date
- Thread