[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-stable] [PATCH 2/2] block: Pass unaligned discard requests to
From: |
Peter Lieven |
Subject: |
Re: [Qemu-stable] [PATCH 2/2] block: Pass unaligned discard requests to drivers |
Date: |
Fri, 11 Nov 2016 11:58:17 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 |
Am 11.11.2016 um 00:10 schrieb Eric Blake:
> Discard is advisory, so rounding the requests to alignment
> boundaries is never semantically wrong from the data that
> the guest sees. But at least the Dell Equallogic iSCSI SANs
> has an interesting property that its advertised discard
> alignment is 15M, yet documents that discarding a sequence
> of 1M slices will eventually result in the 15M page being
> marked as discarded, and it is possible to observe which
> pages have been discarded.
>
> Between commits 9f1963b and b8d0a980, we converted the block
> layer to a byte-based interface that ultimately ignores any
> unaligned head or tail based on the driver's advertised
> discard granularity, which means that qemu 2.7 refuses to
> pass any discard request smaller than 15M down to the Dell
> Equallogic hardware. This is a slight regression in behavior
> compared to earlier qemu, where a guest executing discards
> in power-of-2 chunks used to be able to get every page
> discarded, but is now left with various pages still allocated
> because the guest requests did not align with the hardware's
> 15M pages.
>
> Since the SCSI specification says nothing about a minimum
> discard granularity, and only documents the preferred
> alignment, it is best if the block layer gives the driver
> every bit of information about discard requests, rather than
> rounding it to alignment boundaries early.
>
> Rework the block layer discard algorithm to mirror the write
> zero algorithm: always peel off any unaligned head or tail
> and manage that in isolation, then do the bulk of the request
> on an aligned boundary. The fallback when the driver returns
> -ENOTSUP for an unaligned request is to silently ignore that
> portion of the discard request; but for devices that can pass
> the partial request all the way down to hardware, this can
> result in the hardware coalescing requests and discarding
> aligned pages after all.
>
> Reported by: Peter Lieven <address@hidden>
> Signed-off-by: Eric Blake <address@hidden>
> CC: address@hidden
> ---
> block/io.c | 36 +++++++++++++++++++++++-------------
> 1 file changed, 23 insertions(+), 13 deletions(-)
>
> diff --git a/block/io.c b/block/io.c
> index aa532a5..76c3030 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2421,7 +2421,7 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs,
> int64_t offset,
> {
> BdrvTrackedRequest req;
> int max_pdiscard, ret;
> - int head, align;
> + int head, tail, align;
>
> if (!bs->drv) {
> return -ENOMEDIUM;
> @@ -2444,19 +2444,15 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState
> *bs, int64_t offset,
> return 0;
> }
>
> - /* Discard is advisory, so ignore any unaligned head or tail */
> + /* Discard is advisory, but some devices track and coalesce
> + * unaligned requests, so we must pass everything down rather than
> + * round here. Still, most devices will just silently ignore
> + * unaligned requests (by returning -ENOTSUP), so we must fragment
> + * the request accordingly. */
> align = MAX(bs->bl.pdiscard_alignment, bs->bl.request_alignment);
> assert(align % bs->bl.request_alignment == 0);
> head = offset % align;
> - if (head) {
> - head = MIN(count, align - head);
> - count -= head;
> - offset += head;
> - }
> - count = QEMU_ALIGN_DOWN(count, align);
> - if (!count) {
> - return 0;
> - }
> + tail = (offset + count) % align;
>
> bdrv_inc_in_flight(bs);
> tracked_request_begin(&req, bs, offset, count, BDRV_TRACKED_DISCARD);
> @@ -2468,11 +2464,25 @@ int coroutine_fn bdrv_co_pdiscard(BlockDriverState
> *bs, int64_t offset,
>
> max_pdiscard = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_pdiscard,
> INT_MAX),
> align);
> - assert(max_pdiscard);
> + assert(max_pdiscard >= bs->bl.request_alignment);
>
> while (count > 0) {
> int ret;
> - int num = MIN(count, max_pdiscard);
> + int num = count;
> +
> + if (head) {
> + /* Make a small request up to the first aligned sector. */
> + num = MIN(count, align - head);
> + head = (head + num) % align;
Is there any way that head is != 0 after this?
Peter