[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for
From: |
Paolo Bonzini |
Subject: |
Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations |
Date: |
Mon, 12 Mar 2012 13:27:58 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120216 Thunderbird/10.0.1 |
Il 10/03/2012 19:02, Richard Laager ha scritto:
> I propose adding the following behaviors in any event:
> * If a QEMU block device reports a discard_granularity > 0, it
> must be equal to 2^n (n >= 0), or QEMU's block core will change
> it to 0. (Non-power-of-two granularities are not likely to exist
> in the real world, and this assumption greatly simplifies
> ensuring correctness.)
Yeah, I was considering this to be simply a bug in the block device.
> * For SCSI, report an unmap_granularity to the guest as follows:
> max(logical_block_size, discard_granularity) / logical_block_size
This is more or less already in place later in the series.
> As a design concept, instead of guaranteeing that 512B zero'ing discards
> are supported, I think the QEMU block layer should instead guarantee
> aligned discards to QEMU block devices, emulating any misaligned
> discards (or portions thereof) by writing zeroes if (and only if)
> discard_zeros_data is set.
Yes, this can be done of course. This series does not include it yet.
> This leaves one remaining issue: In raw-posix.c, for files (i.e. not
> devices), I assume you're going to advertise discard_granularity=1 and
> discard_zeros_data=1 when compiled with support for
> fallocate(FALLOC_FL_PUNCH_HOLE). Note, I'm assuming fallocate() actually
> guarantees that it zeros the data when punching holes.
It does, that's pretty much the definition of a hole.
> If the guest does a big discard (think mkfs) and fallocate() returns
> EOPNOTSUPP, you'll have to zero essentially the whole virtual disk,
> which, as you noted, will also allocate it (unless you explicitly check
> for holes). This is bad. It can be avoided by not advertising
> discard_zeros_data, but as you noted, that's unfortunate.
If you have a new kernel that supports SEEK_HOLE/SEEK_DATA, it can also
be done by skipping the zero write on known holes.
This could even be done at the block layer level using bdrv_is_allocated.
> If we could probe for FALLOC_FL_PUNCH_HOLE support, then we could avoid
> advertising discard support based on FALLOC_FL_PUNCH_HOLE when it is not
> going to work. This would side step these problems.
... and introduce others when migrating if your datacenter doesn't have
homogeneous kernel versions and/or filesystems. :(
> You said it wasn't
> possible to probe for FALLOC_FL_PUNCH_HOLE. Have you considered probing
> by extending the file by one byte and then punching that:
> char buf = 0;
> fstat(s->fd, &st);
> pwrite(s->fd, &buf, 1, st.st_size + 1);
> has_discard = !fallocate(s->fd, FALLOC_FL_PUNCH_HOLE |
> FALLOC_FL_KEEP_SIZE,
> st.st_size + 1, 1);
> ftruncate(s->fd, st.st_size);
Nice trick. :) Yes, that could work.
Do you know if non-Linux operating systems have something similar to
BLKDISCARDZEROES?
Paolo
- [Qemu-devel] [RFC PATCH 00/17] Improvements around discard and write zeroes, Paolo Bonzini, 2012/03/08
- [Qemu-devel] [RFC PATCH 02/17] qed: make write-zeroes bounce buffer smaller than a single cluster, Paolo Bonzini, 2012/03/08
- [Qemu-devel] [RFC PATCH 03/17] block: add discard properties to BlockDriverInfo, Paolo Bonzini, 2012/03/08
- [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Paolo Bonzini, 2012/03/08
- Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Kevin Wolf, 2012/03/09
- Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Paolo Bonzini, 2012/03/09
- Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Richard Laager, 2012/03/10
- Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations,
Paolo Bonzini <=
- Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Kevin Wolf, 2012/03/12
- Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Richard Laager, 2012/03/13
- Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Paolo Bonzini, 2012/03/14
- Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Kevin Wolf, 2012/03/14
- Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Paolo Bonzini, 2012/03/14
- Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Kevin Wolf, 2012/03/14
- Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Paolo Bonzini, 2012/03/14
- Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Kevin Wolf, 2012/03/14
- Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Christoph Hellwig, 2012/03/24
- Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations, Christoph Hellwig, 2012/03/24