qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 08/15] block: introduce preallocate filter


From: Max Reitz
Subject: Re: [PATCH v6 08/15] block: introduce preallocate filter
Date: Thu, 24 Sep 2020 18:50:38 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0

On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
> It's intended to be inserted between format and protocol nodes to
> preallocate additional space (expanding protocol file) on writes
> crossing EOF. It improves performance for file-systems with slow
> allocation.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  docs/system/qemu-block-drivers.rst.inc |  26 ++
>  qapi/block-core.json                   |  20 +-
>  block/preallocate.c                    | 556 +++++++++++++++++++++++++
>  block/meson.build                      |   1 +
>  4 files changed, 602 insertions(+), 1 deletion(-)
>  create mode 100644 block/preallocate.c

Looks good to me in general.

[...]

> diff --git a/block/preallocate.c b/block/preallocate.c
> new file mode 100644
> index 0000000000..6510ad0149
> --- /dev/null
> +++ b/block/preallocate.c

[...]

> +/*
> + * Handle reopen.
> + *
> + * We must implement reopen handlers, otherwise reopen just don't work. 
> Handle
> + * new options and don't care about preallocation state, as it is handled in
> + * set/check permission handlers.
> + */
> +
> +static int preallocate_reopen_prepare(BDRVReopenState *reopen_state,
> +                                      BlockReopenQueue *queue, Error **errp)
> +{
> +    PreallocateOpts *opts = g_new0(PreallocateOpts, 1);
> +
> +    if (!preallocate_absorb_opts(opts, reopen_state->options,
> +                                 reopen_state->bs->file->bs, errp)) {

I tried to find out whether this refers to the old file child, or the
post-reopen one.  As far as I could find out, there is no generic
implementation for changing the file child as part of x-blockdev-reopen:

{"error": {"class": "GenericError", "desc": "Cannot change the option
'file'"}}

Now that’s a shame.  That means you can’t reasonably integrate a
preallocate filter into an existing node graph unless the format driver
checks for the respective child option and issues a replace_node on
commit or something, right?  I suppose any driver who’d want to
implement child replacement would need to attach the new node in
prepare() as some pseudo-child, and then drop the old one and rename the
new one in commit().  I don’t think any driver does that yet, so I
suppose no format driver allows replacement of children yet (except for
quorum...).

Does anyone know what the status on that is?  Are there any plans for
implementing child replacement in reopen, or did I just miss something?

(It looks like the backing child can be replaced, but that’s probably
not a child where the preallocate filter would be placed on top...).

> +        g_free(opts);
> +        return -EINVAL;
> +    }
> +
> +    reopen_state->opaque = opts;
> +
> +    return 0;
> +}

[...]

> +/*
> + * Call on each write. Returns true if @want_merge_zero is true and the 
> region
> + * [offset, offset + bytes) is zeroed (as a result of this call or earlier
> + * preallocation).
> + *
> + * want_merge_zero is used to merge write-zero request with preallocation in
> + * one bdrv_co_pwrite_zeroes() call.
> + */
> +static bool coroutine_fn handle_write(BlockDriverState *bs, int64_t offset,
> +                                      int64_t bytes, bool want_merge_zero)
> +{
> +    BDRVPreallocateState *s = bs->opaque;
> +    int64_t end = offset + bytes;
> +    int64_t prealloc_start, prealloc_end;
> +    int ret;
> +
> +    if (!has_prealloc_perms(bs)) {

Took me a bit to figure out, because it takes a trip to
preallocate_child_perm() to figure out exactly when we’re going to have
the necessary permissions for this to pass.

Then it turns out that this is going to be the case exactly when the
parents collectively have the same permissions (WRITE+RESIZE) on the
preallocate node.

Then I had to wonder whether it’s appropriate not to preallocate if
WRITE is taken, but RESIZE isn’t.  But that seems entirely correct, yes.
 If noone is going to grow the file, then there is no need for
preallocation.  (Vice versa, if noone is going to write, but only
resize, then there is no need for preallocation either.)

So this seems correct, yes.

(It could be argued that if one parent has WRITE, and another has RESIZE
(but neither has both), then we probably don’t need preallocation
either.  But in such an arcane case (which is impossible to figure out
in .bdrv_child_perm() anyway), we might as well just do preallocation.
Won’t hurt.)

> +        /* We don't have state neither should try to recover it */
> +        return false;
> +    }
> +
> +    if (s->data_end < 0) {
> +        s->data_end = bdrv_getlength(bs->file->bs);
> +        if (s->data_end < 0) {
> +            return false;
> +        }
> +
> +        if (s->file_end < 0) {
> +            s->file_end = s->data_end;
> +        }
> +    }
> +
> +    if (end <= s->data_end) {
> +        return false;
> +    }
> +
> +    /* We have valid s->data_end, and request writes beyond it. */
> +
> +    s->data_end = end;
> +    if (s->zero_start < 0 || !want_merge_zero) {
> +        s->zero_start = end;

Skipping this on want_merge_zero == true means that zero writes can be
cached; if you repeatedly perform zero writes into the preallocated
area, then none of those will actually be executed.  I legitimately
don’t know whether that’s OK.

I suppose until someone tells me it isn’t OK, I’ll believe it is.

> +    }
> +
> +    if (s->file_end < 0) {
> +        s->file_end = bdrv_getlength(bs->file->bs);
> +        if (s->file_end < 0) {
> +            return false;
> +        }
> +    }
> +
> +    /* Now s->data_end, s->zero_start and s->file_end are valid. */
> +
> +    if (end <= s->file_end) {
> +        /* No preallocation needed. */
> +        return want_merge_zero && offset >= s->zero_start;
> +    }
> +
> +    /* Now we want new preallocation, as request writes beyond s->data_end. 
> */

s/data_end/file_end/

> +
> +    prealloc_start = want_merge_zero ? MIN(offset, s->file_end) : 
> s->file_end;

I suppose you intentionally use s->file_end here instead of @end, even
if offset <= s->file_end.  I just mention it because I wonder whether
it’s really better to effectively write twice to the same area in such
cases (once zeroes for preallocation, then immediately the data) instead
of only writing the data and then preallocating past it.

(Though if it were the same code just with @end instead of s->file_end
for offset <= s->file_end, then the order would be to preallocate past
@end, and then to write the data.  Which might be suboptimal in terms of
how the blocks are then ordered in the filesystem.)

> +    prealloc_end = QEMU_ALIGN_UP(offset + bytes + s->opts.prealloc_size,

s/offset + bytes/end/?

> +                                 s->opts.prealloc_align);
> +    s->file_end = end;

Why is this set here, when it’s always overwritten after
bdrv_co_pwrite_zeroes() anyway?

(It also seems a bit wrong, because at this point we don’t know yet
whether the data write is going to succeed, so we don’t know for sure
whether the file end is really going to be @end without the preallocation.)

> +
> +    ret = bdrv_co_pwrite_zeroes(
> +            bs->file, prealloc_start, prealloc_end - prealloc_start,
> +            BDRV_REQ_NO_FALLBACK | BDRV_REQ_SERIALISING | BDRV_REQ_NO_WAIT);
> +    if (ret < 0) {
> +        s->file_end = ret;
> +        return false;
> +    }
> +
> +    s->file_end = prealloc_end;
> +    return want_merge_zero;
> +}
> +
> +static int coroutine_fn preallocate_co_pwrite_zeroes(BlockDriverState *bs,
> +        int64_t offset, int bytes, BdrvRequestFlags flags)
> +{
> +    bool want_merge_zero =
> +        (flags & (BDRV_REQ_ZERO_WRITE | BDRV_REQ_NO_FALLBACK)) == flags;

Isn’t this the same as !(flags & ~(ZERO_WRITE | NO_FALLBACK))?  (Maybe
only I would find that simpler to understand, though.)

> +    if (handle_write(bs, offset, bytes, want_merge_zero)) {
> +        return 0;
> +    }
> +
> +    return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
> +}

[...]

> +static int coroutine_fn
> +preallocate_co_truncate(BlockDriverState *bs, int64_t offset,
> +                        bool exact, PreallocMode prealloc,
> +                        BdrvRequestFlags flags, Error **errp)
> +{
> +    ERRP_GUARD();
> +    BDRVPreallocateState *s = bs->opaque;
> +    int ret;
> +
> +    if (s->data_end >= 0 && offset > s->data_end) {
> +        if (s->file_end < 0) {
> +            s->file_end = bdrv_getlength(bs->file->bs);
> +            if (s->file_end < 0) {
> +                error_setg(errp, "failed to get file length");
> +                return s->file_end;
> +            }
> +        }
> +
> +        if (prealloc == PREALLOC_MODE_FALLOC) {
> +            /*
> +             * If offset <= s->file_end, the task is already done, just
> +             * update s->file_end, to move part of "filter preallocation"

s/file_end/data_end/

> +             * to "preallocation requested by user".
> +             * Otherwise just proceed to preallocate missing part.
> +             */
> +            if (offset <= s->file_end) {
> +                s->data_end = offset;
> +                return 0;
> +            }

[...]

> +static int preallocate_check_perm(BlockDriverState *bs,
> +                                  uint64_t perm, uint64_t shared, Error 
> **errp)
> +{
> +    BDRVPreallocateState *s = bs->opaque;
> +
> +    if (s->data_end >= 0 && !can_write_resize(perm)) {
> +        /*
> +         * Loose permissions.

*Lose

(I assume)

> +         * We should truncate in check_perm, as in set_perm bs->file->perm 
> will
> +         * be already changed, and we should not violate it.
> +         */
> +        if (s->file_end < 0) {
> +            s->file_end = bdrv_getlength(bs->file->bs);
> +            if (s->file_end < 0) {
> +                error_setg(errp, "Failed to get file length");
> +                return s->file_end;
> +            }
> +        }
> +
> +        if (s->data_end < s->file_end) {
> +            int ret = bdrv_truncate(bs->file, s->data_end, true,
> +                                    PREALLOC_MODE_OFF, 0, NULL);
> +            if (ret < 0) {
> +                error_setg(errp, "Failed to drop preallocation");
> +                s->file_end = ret;
> +                return ret;
> +            }
> +        }
> +        /*
> +         * We will drop our permissions, as well as allow shared
> +         * permissions, anyone will be able to change the child, so mark
> +         * all states invalid. We'll regain control if get good permissions
> +         * back.
> +         */
> +        s->data_end = s->file_end = s->zero_start = -EINVAL;

Shouldn’t we clear these variables whenever !can_write_resize(perm), not
just if also s->data_end >= 0?

> +    }
> +
> +    return 0;
> +}

Max

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]