[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [PATCH] Fix a race condition and non-leaf images growing in
From: |
Igor Lvovsky |
Subject: |
[Qemu-devel] [PATCH] Fix a race condition and non-leaf images growing in VMDK chains. |
Date: |
Sun, 13 May 2007 04:13:20 -0700 |
Hi,
In this patch I fixed two issues:
1. A race condition during write operations on snapshots.
Now we write the grain of data first and the L2 metadata after.
So, the snapshot will stay correct if the VM will be destroyed in the
middle of the write.
2. Non-leaf images growing during writes.
Assume we have snapshots chain (Base->Snap1->Snap2->...->Leaf) and we
run a VM with the latest image of this chain (leaf image).
We have a problem with non-leaf images growing in the snapshot-chain
(most noticeable when the VM performs aggressive writes).
It's an incorrect behavior according to VMDK spec.
For every write operation into an unknown offset, the active image
must query its ancestors for this offset, and if exists in any of them perform
a read-from-ancestor/modify/write-to-active the whole grain of that offset.
The problem happened upon read-from-ancestor/modify/write-to-active
where the ancestor was 2 or more generations above the active (leaf) image
(not a direct parent), as its direct child was modified.
Fixed by always write to the 'active' (leaf) image.
Regards,
Igor Lvovsky
-----Original Message-----
From: address@hidden [mailto:address@hidden On Behalf Of Fabrice Bellard
Sent: Tuesday, January 16, 2007 9:36 PM
To: address@hidden
Subject: Re: [Qemu-devel] Race condition in VMDK (QCOW*) formats.
Well, it was never said that the QCOW* code was safe if you interrupted
QEMU at some point.
But I agree that it could be safer to write the sector first and update
the links after. It could be interesting to analyze the QCOW2 snapshots
handling too (what if QEMU is stopped during the creation of a snapshot ?).
Regards,
Fabrice.
Igor Lvovsky wrote:
>
>
> Hi all,
>
> I have doubt about the race condition during the *write operation on
> snapshot*.
>
> I think the problem exists in VMDK and QCOW* formats (I didn't checked
> the others).
>
>
>
> The example from the block_vmdk.c.
>
>
>
> static int vmdk_write(BlockDriverState *bs, int64_t sector_num,
>
> const uint8_t *buf, int nb_sectors)
>
> {
>
> BDRVVmdkState *s = bs->opaque;
>
> int ret, index_in_cluster, n;
>
> uint64_t cluster_offset;
>
>
>
> while (nb_sectors > 0) {
>
> index_in_cluster = sector_num & (s->cluster_sectors - 1);
>
> n = s->cluster_sectors - index_in_cluster;
>
> if (n > nb_sectors)
>
> n = nb_sectors;
>
> cluster_offset = get_cluster_offset(bs, sector_num << 9, 1);
>
> if (!cluster_offset)
>
> return -1;
>
> lseek(s->fd, cluster_offset + index_in_cluster * 512, SEEK_SET);
>
> ret = write(s->fd, buf, n * 512);
>
> if (ret != n * 512)
>
> return -1;
>
> nb_sectors -= n;
>
> sector_num += n;
>
> buf += n * 512;
>
> }
>
> return 0;
>
> }
>
>
>
> The /get_cluster_offset(…)/ routine update the L2 table of the metadata
> and return the /cluster_offset. /
>
> After that the /vmdk_write(…)/ routine/ /actually write the grain at
> right place.
>
> So, we have timing hole here.
>
>
>
> Assume, VM that perform write operation will be destroyed at this moment.
>
> So, we have corrupted image (with updated L2 table, but without the
> grain itself).
>
>
>
> Regards,
>
> Igor Lvovsky
>
>
>
>
>
>
>
>
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Qemu-devel mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/qemu-devel
_______________________________________________
Qemu-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/qemu-devel
block-vmdk.diff
Description: block-vmdk.diff
- [Qemu-devel] [PATCH] Fix a race condition and non-leaf images growing in VMDK chains.,
Igor Lvovsky <=