[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Bug 1776920] Re: qemu-img convert on Mac OSX creates c

From: Eric Blake
Subject: Re: [Qemu-devel] [Bug 1776920] Re: qemu-img convert on Mac OSX creates corrupt images
Date: Fri, 7 Sep 2018 17:05:25 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 09/07/2018 04:12 PM, Bruno Haible wrote:
Why would that be important? As far as I understand the SEEK_DATA facility
from the man page [1], the most immediate way to make reasonable use of it is
to call
    offset = lseek (fd, SEEK_DATA, offset);
    offset = lseek (fd, SEEK_HOLE, offset);
alternatingly. On Linux, you may start with SEEK_DATA or SEEK_HOLE; on
macOS, you would need to start with SEEK_HOLE because starting with
SEEK_DATA won't work. Is it a _that_ big problem?

If you are doing a single pass over a file starting from offset 0, then yes, alternating between SEEK_DATA first and SEEK_HOLE second will visit the entire file, with every seek starting from an extent boundary and thus not triggering the bug at hand (and yes, that order is important, because of Solaris - read on to see why starting with SEEK_HOLE at offset 0 is a bad idea). And on MacOS, SEEK_DATA on offset 0 returns 0, if there is no leading hole - the bug at hand is only triggered when you query an offset that does not start an extent, but 0 always starts an extent. But if you are doing random-access reads of portions of the file, and want to know whether a given offset lies within data or a hole (and the file is not being modified by another parallel process), and do not already know if your offset lies on an extent boundary, then this bug is nasty. let's consider your options.

On Linux, if you call both SEEK_DATA and SEEK_HOLE on an offset that is in bounds, then you will always have one of the two calls return the same offset back.

On Solaris, if you call both, one of the two calls will return the same offset, except in the special case that a file that ends in a hole and your offset lies in that final hole (then, DATA fails with ENXIO, while HOLE returns the end of the file instead of the current offset). And that's why starting with SEEK_HOLE at offset 0 is insufficient - if the answer is larger than 0, you still don't know if the file starts with data, or is composed of a single hole, without making a second syscall.

If you want to know where the current data/hole ends, then making both calls gets you that answer every time. But if all you care about is whether you are in data or a hole, and not where it ends, the fact that one of the two answers should return the same offset means you can optimize and make a single SEEK_DATA query to learn where you are in the file (if it is the same offset, you are in data; if it returns a different offset or ENXIO you are in a hole). True, you often need to know where the current extent ends, but if DATA returned a different answer, then you already know you are in a hole and where the hole ends without having to check HOLE. Also, there are some cases where if you know the file system has 64k extents as its minimum hole size, but you don't need to read 64k of data at your starting offset, then you don't need to query for the end (since you won't hit an extent flip in the meantime).

That is, until MacOS comes along, and now both queries return a different offset than your input, but neither fails. If you optimized by calling SEEK_DATA first, you end up treating the current offset as a hole (data loss). And if you make both calls looking for the POSIX-specified patterns, your logic can be thrown off (at which point the only sane response is to treat SEEK_HOLE as broken, and read the entire file rather than benefitting from skipping reads of holes). And if you swap things to call SEEK_HOLE instead of SEEK_DATA first, you run into the issue with Solaris behavior on trailing holes.

As for why random access determination of data/hole is even useful, it helps to understand what qemu is doing. It uses the qcow2 format which remaps a sparse guest view into a compact host file; reading sequential guest addresses can indeed read out-of-order on the host file, and more importantly, you tend to start reading at guest offset 0, but host offset 0 is always the qcow2 header, so the very first read of guest data will occur at a host offset larger than 0 - which makes it very likely that the first address for a SEEK_DATA query is indeed not aligned to an extent boundary.

Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

reply via email to

[Prev in Thread] Current Thread [Next in Thread]