[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] race condition when exec'ing "qemu -incoming" followed
From: |
Daniel P. Berrange |
Subject: |
Re: [Qemu-devel] race condition when exec'ing "qemu -incoming" followed by monitor "cont" |
Date: |
Fri, 9 Apr 2010 17:45:02 +0100 |
User-agent: |
Mutt/1.4.1i |
On Fri, Apr 09, 2010 at 12:03:54PM -0400, Laine Stump wrote:
> (Please forgive (and correct!) any inaccuracies in my description of
> qemu's workings - I've only recently started looking at it directly,
> rather than through the lens of libvirt)
>
> libvirt implements a "domain restore" operation by:
>
> 0) start with a previously saved domain image in a file
>
> 1) open the domain image, and connect it to a pipe
>
> 2) fork, connect the pipe to stdin, and exec qemu with "-incoming exec:cat"
>
> 3) execute "cont" in that qemu's monitor.
>
> (for those familiar with the code, you can look at the
> src/qemu/eqmu_driver.c:qemudDomainRestore() in the libvirt source).
>
> Although this works successfully for most people, I'm consistently
> seeing a problem on my particular hardware (Intel Core 2 Duo 2.2Ghz)
> that causes this domain restore to fail. It seems that the "cont"
> command takes effect before the restore is completed (possibly/probably
> before it even starts?) resulting in a failed restore - the domain is
> left in some random state, sometimes rebooting spontaneously, sometimes
> just hung.
>
> If I insert a usleep(250 * 1000) between starting up qemu with
> "-incoming exec:cat" and issuing "cont" to start the CPUs, the restore
> is successful 100% of the time.
>
> I've been told that once the incoming migration starts, the monitor will
> be non-responsive until it is complete. This should mean that as long as
> the "cont" isn't issued until after the migration starts, it will be
> blocked until the migration is complete, thus protecting us from the
> race; for this reason (along with the fact that a 250msec sleep is
> enough to cure the problem) I'm thinking it's likely the "cont" happens
> before the migration starts.
>
> There is, of course, an "info migrate" command in the monitor that could
> be used to assure the migration had completed before issuing "cont", but
> that command only works for outgoing migrations, not incoming
> (presumably if it was available, checking the info prior to the
> migration starting would return "not started" (or something similar),
> and once it had started, the entire monitor interface would block until
> the migrate was completed).
Yep, I'd really like to see a 'info migrate' or equivalent that works
for incoming migration, even if it can't give us progress info, just
the status report is important to detect failure & completion.
Daniel
--
|: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|