qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handlin


From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH v2 3/4] raw-posix: Fix try_seek_hole()'s handling of SEEK_DATA failure
Date: Fri, 14 Nov 2014 14:12:20 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

Eric Blake <address@hidden> writes:

> On 11/13/2014 08:29 AM, Eric Blake wrote:
>
>>>> lseek() with SEEK_DATA starting in a hole when there is no data until
>>>> EOF is actually the part that isn't documented in the man page, but
>>>> ENXIO is what I'm seeing here on RHEL 7.
>>>
>>> Here's the (proposed) POSIX wording:
>>>
>>> http://austingroupbugs.net/view.php?id=415
>>>
>>> And ENXIO is indeed the expected error for SEEK_DATA on a trailing hole,
>>> so maybe we should special case it.
>>>
>> 
>> Uggh.  Historical practice on Solaris (and therefore the POSIX wording)
>> says that SEEK_HOLE in a trailing hole is allowed (but not required) to
>> seek to EOF instead of reporting the offset requested.  I have no clue
>> why this was done, but it is VERY annoying - it means that if you
>> provide an offset within a tail hole of a file, you cannot reliably tell
>> if the file ends in a hole or with data, without ALSO trying SEEK_DATA.
>>  For applications that are reading a file sequentially but skipping over
>> holes, this behavior is fine (it short-circuits the hole/data search
>> points and might shave an iteration off a lop).  But for OUR purposes,
>> where we are merely trying to ascertain whether we are in a hole, we
>> have an inaccurate response - since SEEK_HOLE does NOT return the offset
>> we passed in, we are prone to treat the offset as belonging to data,
>> which is a pessimization (you never get wrong results by treating a hole
>> as data and reading it, but it is definitely slower).
>> 
>> I think you HAVE to call lseek() twice, both with SEEK_HOLE and with
>> SEEK_DATA, if you want to accurately determine whether an offset happens
>> to live within a trailing hole.
>
> Here's a table of possible situations, based solely on POSIX wording
> (and not on actual tests on Solaris or Linux, although it shouldn't be
> too hard to confirm behavior):
>
> 0-length file:
> lseek(fd, 0, SEEK_HOLE) => -1 ENXIO
> lseek(fd, 0, SEEK_DATA) => -1 ENXIO
> conclusion: 0 is at EOF

Isn't this a special case of the next one?

> file of any size:
> lseek(fd, size_or_larger, SEEK_HOLE) => -1 ENXIO
> lseek(fd, size_or_larger, SEEK_DATA) => -1 ENXIO
> conclusion: size_or_larger is at or beyond EOF
>
> file where offset is in a hole, but data appears later:
> lseek(fd, offset, SEEK_HOLE) => offset
> lseek(fd, offset, SEEK_DATA) => end_of_hole
> conclusion: offset through end_of_hole is in a hole
>
> file where offset is data, whether or not a hole appears later:
> lseek(fd, offset, SEEK_HOLE) => end_of_data
> lseek(fd, offset, SEEK_DATA) => offset
> conclusion: offset through end_of_data is in data
>
> file where offset is in a tail hole, option 1:
> lseek(fd, offset, SEEK_HOLE) => offset
> lseek(fd, offset, SEEK_DATA) => -1 ENXIO
> conclusion: offset through EOF is in hole, but another seek needed to
> learn EOF
>
> file where offset is in a tail hole, option 2:
> lseek(fd, offset, SEEK_HOLE) => EOF
> lseek(fd, offset, SEEK_DATA) => -1 ENXIO
> conclusion: offset through EOF is in hole, no additional seek needed
>
> The two calls are both necessary, in order to learn which extant type
> offset belongs to, and to tell where that extant ends; and the behaviors
> are distinguishable (if both lseek() succeed, we have both numbers we
> want; if both fail with ENXIO, we know the offset is at or beyond EOF;
> and if only SEEK_HOLE fails with ENXIO, we know we have a trailing
> hole); and we can tell at runtime what to do about a trailing hole (if
> the return value is offset, we need one more lseek(fd, 0, SEEK_END) to
> find EOF; if the return value is larger than offset, we have EOF for
> free).  You can optimize by calling SEEK_HOLE first (if it fails with
> ENXIO, there is no need to try SEEK_DATA); but SEEK_HOLE in isolation is
> insufficient to give you all the information you need.

Not discussed: how to handle failures other than ENXIO.

The appended code still avoids a second seek in one case.  Useful mostly
because it saves us from handling a second seek's contradictory
information.


/*
 * Find allocation range in @bs around offset @start.
 * May change underlying file descriptor's file offset.
 * If @start is not in a hole, store @start in @data, and the
 * beginning of the next hole in @hole, and return 0.
 * If @start is in a non-trailing hole, store @start in @hole and the
 * beginning of the next non-hole in @data, and return 0.
 * If @start is in a trailing hole or beyond EOF, return -ENXIO.
 * If we can't find out, return a negative errno other than -ENXIO.
 */
static int find_allocation(BlockDriverState *bs, off_t start,
                           off_t *data, off_t *hole)
{
#if defined SEEK_HOLE && defined SEEK_DATA
    BDRVRawState *s = bs->opaque;
    off_t offs;

    /*
     * SEEK_DATA cases:
     * D1. offs == start: start is in data
     * D2. offs > start: start is in a hole, next data at offs
     * D3. offs < 0, errno = ENXIO: either start is in a trailing hole
     *                              or start is beyond EOF
     *     If the latter happens, the file has been truncated behind
     *     our back since we opened it.  Best we can do is treat like
     *     a trailing hole.
     * D4. offs < 0, errno != ENXIO: we learned nothing
     */
    offs = lseek(s->fd, start, SEEK_DATA);
    if (offs < 0) {
        return -errno;          /* D3 or D4 */
    }
    assert(offs >= start);

    if (offs > start) {
        /* D2: in hole, next data at offs */
        *hole = start;
        *data = offs;
        return 0;
    }

    /* D1: in data, end not yet known */

    /*
     * SEEK_HOLE cases:
     * H1. offs == start: start is in a hole
     *     If this happens here, a hole has been dug behind our back
     *     since the previous lseek().
     * H2. offs > start: either start is in data, next hole at offs,
     *                   or start is in trailing hole, EOF at offs
     *     Linux treats trailing holes like any other hole: offs ==
     *     start.  Solaris seeks to EOF instead: offs > start (blech).
     *     If that happens here, a hole has been dug behind our back
     *     since the previous lseek().
     * H3. offs < 0, errno = ENXIO: start is beyond EOF
     *     If this happens, the file has been truncated behind our
     *     back since we opened it.  Treat it like a trailing hole.
     * H4. offs < 0, errno != ENXIO: we learned nothing
     *     Pretend we know nothing at all, i.e. "forget" about D1.
     */
    offs = lseek(s->fd, start, SEEK_HOLE);
    if (offs < 0) {
        return -errno;          /* D1 and (H3 or H4) */
    }
    assert(offs >= start);

    if (offs > start) {
        /*
         * D1 and H2: either in data, next hole at offs, or it was in
         * data but is now in a trailing hole.  Treating the latter as
         * if it there was data extending to EOF is safe, so simply do
         * that.
         */
        *data = start;
        *hole = offs;
        return 0;
    }

    /* D1 and H1 */
    return -EBUSY;
#else
    return -ENOTSUP;
#endif
}



reply via email to

[Prev in Thread] Current Thread [Next in Thread]