qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 08/27] block/parallels: _co_writev callback for


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH 08/27] block/parallels: _co_writev callback for Parallels format
Date: Thu, 23 Apr 2015 12:09:30 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Am 23.04.2015 um 11:47 hat Denis V. Lunev geschrieben:
> On 23/04/15 12:32, Kevin Wolf wrote:
> >Am 23.04.2015 um 11:20 hat Stefan Hajnoczi geschrieben:
> >>On Wed, Apr 22, 2015 at 04:16:38PM +0300, Denis V. Lunev wrote:
> >>>On 22/04/15 16:08, Stefan Hajnoczi wrote:
> >>>>On Wed, Mar 11, 2015 at 01:28:02PM +0300, Denis V. Lunev wrote:
> >>>>>+static int64_t allocate_cluster(BlockDriverState *bs, int64_t 
> >>>>>sector_num)
> >>>>>+{
> >>>>>+    BDRVParallelsState *s = bs->opaque;
> >>>>>+    uint32_t idx, offset, tmp;
> >>>>>+    int64_t pos;
> >>>>>+    int ret;
> >>>>>+
> >>>>>+    idx = sector_num / s->tracks;
> >>>>>+    offset = sector_num % s->tracks;
> >>>>>+
> >>>>>+    if (idx >= s->catalog_size) {
> >>>>>+        return -EINVAL;
> >>>>>+    }
> >>>>>+    if (s->catalog_bitmap[idx] != 0) {
> >>>>>+        return (uint64_t)s->catalog_bitmap[idx] * s->off_multiplier + 
> >>>>>offset;
> >>>>>+    }
> >>>>>+
> >>>>>+    pos = bdrv_getlength(bs->file) >> BDRV_SECTOR_BITS;
> >>>>>+    bdrv_truncate(bs->file, (pos + s->tracks) << BDRV_SECTOR_BITS);
> >>>>>+    s->catalog_bitmap[idx] = pos / s->off_multiplier;
> >>>>>+
> >>>>>+    tmp = cpu_to_le32(s->catalog_bitmap[idx]);
> >>>>>+
> >>>>>+    ret = bdrv_pwrite_sync(bs->file,
> >>>>>+            sizeof(ParallelsHeader) + idx * sizeof(tmp), &tmp, 
> >>>>>sizeof(tmp));
> >>>>What is the purpose of the sync?
> >>>This is necessary to preserve image consistency on crash from
> >>>my point of view. There is no check consistency at the moment.
> >>>The sync will be removed later when proper crash detection
> >>>code will be added (patches 19, 20, 21)
> >>Let's look at possible orderings in case of failure:
> >>
> >>A. BAT update
> >>B. Data write
> >>
> >>This sync enforces A, B ordering.  If we can see B, then A must also
> >>have happened thanks to the sync.
> >>
> >>But A, B ordering is too conservative.  Imagine B, A ordering and the
> >>failure where we crash before A.  It means we wrote the data but never
> >>linked it into the BAT.
> >>
> >>What happens in that case?  We've leaked a cluster in the underlying
> >>image file but it doesn't corrupt the visible disk from the guest
> >>point-of-view.
> >>
> >>Because your implementation uses truncate to extend the file size before
> >>A, even the A, B failure case results in a leaked cluster.  So the B, A
> >>case is not worse in any way.
> >>
> >>Why do other image formats sync cluster allocation updates?  Because
> >>they support backing files and in that case an A, B ordering results in
> >>data corruption so they enforce B, A ordering (the opposite of what
> >>you're trying to do!).
> >>
> >>The reason why A, B ordering results in data corruption when backing
> >>files are in use is because the guest's write request might touch only a
> >>subset of the cluster (a couple of sectors out of the whole cluster).
> >>So the guest needs to copy the remaining sectors from the backing file.
> >>If there is a dangling BAT entry like in the A, B failure case, then the
> >>guest will see a zeroed cluster instead of the contents of the backing
> >>file.  This is a data corruption, but only if a backing file is being
> >>used!
> >>
> >>So the sync is not necessary, both A, B and B, A ordering work for
> >>block/parallels.c.
> >Actually, I suspect this means that the parallels driver is restricted
> >to protocols with bdrv_has_zero_init() == true, otherwise zeros can turn
> >into random data (which means that it can't work e.g. directly on host
> >block devices).
> >
> >Do we enforce this?
> >
> >Kevin
> this is fixed in the patch 26 when the code is replaced with
> 
> +    if (s->data_end + s->tracks > pos) {
> +        int ret;
> +        if (s->prealloc_mode == PRL_PREALLOC_MODE_FALLOCATE) {
> +            ret = bdrv_write_zeroes(bs->file, s->data_end,
> +                                    s->prealloc_size, 0);
> +        } else {
> +            ret = bdrv_truncate(bs->file,
> +                    (s->data_end + s->prealloc_size) << BDRV_SECTOR_BITS);
> +        }
> +        if (ret < 0) {
> +            return ret;
> +        }
> +    }
> 
> on a default path, but you are correct. Some checking is
> necessary to be on a safe side.

Ah, yes, you're right. That should fix it for the default settings. The
check is only needed when using the bdrv_truncate() path.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]