Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for

From:	Paolo Bonzini
Subject:	Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations
Date:	Mon, 12 Mar 2012 13:27:58 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120216 Thunderbird/10.0.1

Il 10/03/2012 19:02, Richard Laager ha scritto:
> I propose adding the following behaviors in any event:
>       * If a QEMU block device reports a discard_granularity > 0, it
>         must be equal to 2^n (n >= 0), or QEMU's block core will change
>         it to 0. (Non-power-of-two granularities are not likely to exist
>         in the real world, and this assumption greatly simplifies
>         ensuring correctness.)

Yeah, I was considering this to be simply a bug in the block device.

>       * For SCSI, report an unmap_granularity to the guest as follows:
>       max(logical_block_size, discard_granularity) / logical_block_size

This is more or less already in place later in the series.

> As a design concept, instead of guaranteeing that 512B zero'ing discards
> are supported, I think the QEMU block layer should instead guarantee
> aligned discards to QEMU block devices, emulating any misaligned
> discards (or portions thereof) by writing zeroes if (and only if)
> discard_zeros_data is set.

Yes, this can be done of course.  This series does not include it yet.

> This leaves one remaining issue: In raw-posix.c, for files (i.e. not
> devices), I assume you're going to advertise discard_granularity=1 and
> discard_zeros_data=1 when compiled with support for
> fallocate(FALLOC_FL_PUNCH_HOLE). Note, I'm assuming fallocate() actually
> guarantees that it zeros the data when punching holes.

It does, that's pretty much the definition of a hole.

> If the guest does a big discard (think mkfs) and fallocate() returns
> EOPNOTSUPP, you'll have to zero essentially the whole virtual disk,
> which, as you noted, will also allocate it (unless you explicitly check
> for holes). This is bad. It can be avoided by not advertising
> discard_zeros_data, but as you noted, that's unfortunate.

If you have a new kernel that supports SEEK_HOLE/SEEK_DATA, it can also
be done by skipping the zero write on known holes.

This could even be done at the block layer level using bdrv_is_allocated.

> If we could probe for FALLOC_FL_PUNCH_HOLE support, then we could avoid
> advertising discard support based on FALLOC_FL_PUNCH_HOLE when it is not
> going to work. This would side step these problems. 

... and introduce others when migrating if your datacenter doesn't have
homogeneous kernel versions and/or filesystems. :(

> You said it wasn't
> possible to probe for FALLOC_FL_PUNCH_HOLE. Have you considered probing
> by extending the file by one byte and then punching that:
>         char buf = 0;
>         fstat(s->fd, &st);
>         pwrite(s->fd, &buf, 1, st.st_size + 1);
>         has_discard = !fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | 
> FALLOC_FL_KEEP_SIZE,
>                                  st.st_size + 1, 1);
>         ftruncate(s->fd, st.st_size);

Nice trick. :)   Yes, that could work.

Do you know if non-Linux operating systems have something similar to
BLKDISCARDZEROES?

Paolo

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [RFC PATCH 00/17] Improvements around discard and write zeroes, Paolo Bonzini, 2012/03/08
- [Qemu-devel] [RFC PATCH 02/17] qed: make write-zeroes bounce buffer smaller than a single cluster, Paolo Bonzini, 2012/03/08
- [Qemu-devel] [RFC PATCH 03/17] block: add discard properties to BlockDriverInfo, Paolo Bonzini, 2012/03/08
  - Re: [Qemu-devel] [RFC PATCH 03/17] block: add discard properties to BlockDriverInfo, Kevin Wolf, 2012/03/09
- [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Paolo Bonzini, 2012/03/08
  - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Kevin Wolf, 2012/03/09
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Paolo Bonzini, 2012/03/09
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Richard Laager, 2012/03/10
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Paolo Bonzini <=
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Kevin Wolf, 2012/03/12
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Richard Laager, 2012/03/13
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Paolo Bonzini, 2012/03/14
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Kevin Wolf, 2012/03/14
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Paolo Bonzini, 2012/03/14
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Kevin Wolf, 2012/03/14
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Paolo Bonzini, 2012/03/14
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Kevin Wolf, 2012/03/14
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Christoph Hellwig, 2012/03/24
    - Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Christoph Hellwig, 2012/03/24

Prev by Date: Re: [Qemu-devel] [Spice-devel] seamless migration with spice
Next by Date: Re: [Qemu-devel] [Spice-devel] seamless migration with spice
Previous by thread: Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations
Next by thread: Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations
Index(es):
- Date
- Thread