[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces
From: |
Juan Quintela |
Subject: |
Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions |
Date: |
Tue, 15 Nov 2011 14:20:12 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.3 (gnu/linux) |
Anthony Liguori <address@hidden> wrote:
> On 11/14/2011 04:16 AM, Daniel P. Berrange wrote:
>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
>>>>> Live migration with qcow2 or any other image format is just not going to
>>>>> work
>>>>> right now even with proper clustered storage. I think doing a block
>>>>> level flush
>>>>> cache interface and letting block devices decide how to do it is the best
>>>>> approach.
>>>>
>>>> I would really prefer reusing the existing open/close code. It means
>>>> less (duplicated) code, is existing code that is well tested and doesn't
>>>> make migration much of a special case.
>>>>
>>>> If you want to avoid reopening the file on the OS level, we can reopen
>>>> only the topmost layer (i.e. the format, but not the protocol) for now
>>>> and in 1.1 we can use bdrv_reopen().
>>>>
>>>
>>> Intuitively I dislike _reopen style interfaces. If the second open
>>> yields different results from the first, does it invalidate any
>>> computations in between?
>>>
>>> What's wrong with just delaying the open?
>>
>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
>> the ability to rollback to the source host upon open failure for most
>> deployed versions of libvirt. We only fairly recently switched to a five
>> stage migration handshake to cope with rollback when 'cont' fails.
>
> Delayed open isn't a panacea. With the series I sent, we should be
> able to migration with a qcow2 file on coherent shared storage.
>
> There are two other cases that we care about: migration with nfs
> cache!=none and direct attached storage with cache!=none
>
> Whether the open is deferred matters less with NFS than if the open
> happens after the close on the source. To fix NFS cache!=none, we
> would have to do a bdrv_close() before sending the last byte of
> migration data and make sure that we bdrv_open() after receiving the
> last byte of migration data.
>
> The problem with this IMHO is it creates a large window where noone
> has the file open and you're critically vulnerable to losing your VM.
Red Hat NFS guru told that fsync() on source + open() after that on
target is enough. But anyways, it still depends of nothing else having
the file opened on target.
> I'm much more in favor of a smarter caching policy. If we can fcntl()
> our way to O_DIRECT on NFS, that would be fairly interesting. I'm not
> sure if this is supported today but it's something we could look into
> adding in the kernel. That way we could force NFS to O_DIRECT during
> migration which would solve this problem robustly.
We would need O_DIRECT on target during migration, I agree than that
would work.
> Deferred open doesn't help with direct attached storage. There simple
> is no guarantee that there isn't data in the page cache.
Yeap, I asked the clustered filesystem people how they fixed the
problem, because clustered filesystem have this problem, right. After
lots of arm twisting, I got the ioctl(BLKFLSBUF,...), but that only
works:
- on linux
- on some block devices
So, we are back to square 1.
> Again, I think defaulting DAS to cache=none|directsync is what makes
> the most sense here.
I think it is the only sane solution. Otherwise, we need to write the
equivalent of a lock manager, to know _who_ has the storage, and
distributed lock managers are a mess :-(
> We can even add a migration blocker for DAS with cache=on. If we can
> do dynamic toggling of the cache setting, then that's pretty friendly
> at the end of the day.
That could fix the problem also. At the moment that we start migration,
we do an fsync() + switch to O_DIRECT for all filesystems.
As you said, time for implementing fcntl(O_DIRECT).
Later, Juan.
- Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions, (continued)
- Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions, Michael S. Tsirkin, 2011/11/14
- Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions, Daniel P. Berrange, 2011/11/14
- Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions, Michael S. Tsirkin, 2011/11/14
- Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions, Gleb Natapov, 2011/11/14
- Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions, Michael S. Tsirkin, 2011/11/14
- Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions, Anthony Liguori, 2011/11/14
- Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions,
Juan Quintela <=
- Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions, Anthony Liguori, 2011/11/15
Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces qcow2 corruptions, Juan Quintela, 2011/11/09