[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 08/27] block/parallels: _co_writev callback for
From: |
Kevin Wolf |
Subject: |
Re: [Qemu-devel] [PATCH 08/27] block/parallels: _co_writev callback for Parallels format |
Date: |
Thu, 23 Apr 2015 11:32:23 +0200 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Am 23.04.2015 um 11:20 hat Stefan Hajnoczi geschrieben:
> On Wed, Apr 22, 2015 at 04:16:38PM +0300, Denis V. Lunev wrote:
> > On 22/04/15 16:08, Stefan Hajnoczi wrote:
> > >On Wed, Mar 11, 2015 at 01:28:02PM +0300, Denis V. Lunev wrote:
> > >>+static int64_t allocate_cluster(BlockDriverState *bs, int64_t sector_num)
> > >>+{
> > >>+ BDRVParallelsState *s = bs->opaque;
> > >>+ uint32_t idx, offset, tmp;
> > >>+ int64_t pos;
> > >>+ int ret;
> > >>+
> > >>+ idx = sector_num / s->tracks;
> > >>+ offset = sector_num % s->tracks;
> > >>+
> > >>+ if (idx >= s->catalog_size) {
> > >>+ return -EINVAL;
> > >>+ }
> > >>+ if (s->catalog_bitmap[idx] != 0) {
> > >>+ return (uint64_t)s->catalog_bitmap[idx] * s->off_multiplier +
> > >>offset;
> > >>+ }
> > >>+
> > >>+ pos = bdrv_getlength(bs->file) >> BDRV_SECTOR_BITS;
> > >>+ bdrv_truncate(bs->file, (pos + s->tracks) << BDRV_SECTOR_BITS);
> > >>+ s->catalog_bitmap[idx] = pos / s->off_multiplier;
> > >>+
> > >>+ tmp = cpu_to_le32(s->catalog_bitmap[idx]);
> > >>+
> > >>+ ret = bdrv_pwrite_sync(bs->file,
> > >>+ sizeof(ParallelsHeader) + idx * sizeof(tmp), &tmp,
> > >>sizeof(tmp));
> > >What is the purpose of the sync?
> > This is necessary to preserve image consistency on crash from
> > my point of view. There is no check consistency at the moment.
> > The sync will be removed later when proper crash detection
> > code will be added (patches 19, 20, 21)
>
> Let's look at possible orderings in case of failure:
>
> A. BAT update
> B. Data write
>
> This sync enforces A, B ordering. If we can see B, then A must also
> have happened thanks to the sync.
>
> But A, B ordering is too conservative. Imagine B, A ordering and the
> failure where we crash before A. It means we wrote the data but never
> linked it into the BAT.
>
> What happens in that case? We've leaked a cluster in the underlying
> image file but it doesn't corrupt the visible disk from the guest
> point-of-view.
>
> Because your implementation uses truncate to extend the file size before
> A, even the A, B failure case results in a leaked cluster. So the B, A
> case is not worse in any way.
>
> Why do other image formats sync cluster allocation updates? Because
> they support backing files and in that case an A, B ordering results in
> data corruption so they enforce B, A ordering (the opposite of what
> you're trying to do!).
>
> The reason why A, B ordering results in data corruption when backing
> files are in use is because the guest's write request might touch only a
> subset of the cluster (a couple of sectors out of the whole cluster).
> So the guest needs to copy the remaining sectors from the backing file.
> If there is a dangling BAT entry like in the A, B failure case, then the
> guest will see a zeroed cluster instead of the contents of the backing
> file. This is a data corruption, but only if a backing file is being
> used!
>
> So the sync is not necessary, both A, B and B, A ordering work for
> block/parallels.c.
Actually, I suspect this means that the parallels driver is restricted
to protocols with bdrv_has_zero_init() == true, otherwise zeros can turn
into random data (which means that it can't work e.g. directly on host
block devices).
Do we enforce this?
Kevin
pgpiDhzCsJt8F.pgp
Description: PGP signature