[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap

From: Eric Blake
Subject: Re: [Qemu-devel] [PATCH for-3.0] file-posix: Fix write_zeroes with unmap on block devices
Date: Thu, 26 Jul 2018 10:23:27 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0

On 07/26/2018 10:06 AM, Kevin Wolf wrote:

+    ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+                       aiocb->aio_offset, aiocb->aio_nbytes);

Umm, doesn't this have to use FALLOC_FL_ZERO_RANGE? FALLOC_FL_PUNCH_HOLE
deallocs, but is not required to write zeroes.

Yes, it is. See the man page:

     Specifying the FALLOC_FL_PUNCH_HOLE flag (available since Linux
     2.6.38) in mode deallocates space (i.e., creates a hole) in the byte
     range starting at offset and continuing for len bytes. Within the
     specified range, partial filesystem blocks are zeroed, and whole
     filesystem blocks are removed from the file. After a successful
     call, subsequent reads from this range will return zeroes.

That's true for file-system fds, but not for block device fds.

As pointed out by Nir,

> https://patchwork.kernel.org/patch/9903757/
Which says, among other things:

>> Do we also know that the blocks were discarded as we do with
> There never was a way to know for sure.
> ATA DSM TRIM and SCSI UNMAP are hints by definition. We attempted to
> bend their semantics towards getting predictable behavior but ultimately
> failed. Too many corner cases.
>> As I mentioned before. We relied on discard_zeroes_data in mkfs.ext4
>> to make sure that inode tables are zeroed after discard.
> The point is that you shouldn't have an if (discard_zeroes_data)
> conditional in the first place.
>  - If you need to dellocate a block range and you don't care about its
>    contents in the future, use BLKDISCARD / FL_PUNCH_HOLE.
>  - If you need to zero a block range, use BLKZEROOUT / FL_ZERO_RANGE.

PUNCH_HOLE deallocates; but can only guarantee a read back of zero on file systems.

Hmm - that thread also mentions FALLOC_FL_NO_HIDE_STALE, which is a new flag not present/documented on Fedora 28. I wonder if it helps, too.

FALLOC_FL_ZERO_RANGE in contrast implements write_zeroes without unmap.

I thought the opposite: FALLOC_FL_ZERO_RANGE guarantees that you read back 0, using whatever is most efficient under the hood (in the case of block devices, unmapping that reliably reads back as zero is favored).

Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

reply via email to

[Prev in Thread] Current Thread [Next in Thread]