[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: [PATCH 3/3] disk: don't read from disk until the gu

From: Kevin Wolf
Subject: Re: [Qemu-devel] Re: [PATCH 3/3] disk: don't read from disk until the guest starts
Date: Mon, 13 Sep 2010 22:03:28 +0200
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20100907 Fedora/3.0.7-1.fc12 Thunderbird/3.0.7

Am 13.09.2010 21:29, schrieb Stefan Hajnoczi:
> On Mon, Sep 13, 2010 at 3:13 PM, Kevin Wolf <address@hidden> wrote:
>> Am 13.09.2010 15:42, schrieb Anthony Liguori:
>>> On 09/13/2010 08:39 AM, Kevin Wolf wrote:
>>>>> Yeah, one of the key design points of live migration is to minimize the
>>>>> number of failure scenarios where you lose a VM.  If someone typed the
>>>>> wrong command line or shared storage hasn't been mounted yet and we
>>>>> delay failure until live migration is in the critical path, that would
>>>>> be terribly unfortunate.
>>>> We would catch most of them if we try to open the image when migration
>>>> starts and immediately close it again until migration is (almost)
>>>> completed, so that no other code can possibly use it before the source
>>>> has really closed it.
>>> I think the only real advantage is that we fix NFS migration, right?
>> That's the one that we know about, yes.
>> The rest is not a specific scenario, but a strong feeling that having an
>> image opened twice at the same time feels dangerous. As soon as an
>> open/close sequence writes to the image for some format, we probably
>> have a bug. For example, what about this mounted flag that you were
>> discussing for QED?
> There is some room left to work in, even if we can't check in open().
> One idea would be to do the check asynchronously once I/O begins.  It
> is actually easy to check L1/L2 tables as they are loaded.
> The only barrier relationship between I/O and checking is that an
> allocating write (which will need to update L1/L2 tables) is only
> allowed after check completes.  Otherwise reads and non-allocating
> writes may proceed while the image is not yet fully checked.  We can
> detect when a table element is an invalid offset and discard it.

I'm not even talking about such complicated things. You wanted to have a
dirty flag in the header, right? So when we allow opening an image
twice, you get this sequence with migration:

Source: open
Destination: open (with dirty image)
Source: close

The image is now marked as clean, even though the destination is still
working on it.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]