Re: [Qemu-devel] [PATCH 1/7] block: fix maximum length sent to bdrv_co

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/7] block: fix maximum length sent to bdrv_co_d

From:	Denis V. Lunev
Subject:	Re: [Qemu-devel] [PATCH 1/7] block: fix maximum length sent to bdrv_co_do_write_zeroes callback in bs
Date:	Sat, 27 Dec 2014 23:14:35 +0300
User-agent:	Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

On 27/12/14 23:01, Peter Lieven wrote:

Am 27.12.2014 um 18:42 schrieb Denis V. Lunev:

On 27/12/14 17:52, Peter Lieven wrote:

Am 26.12.2014 um 20:15 schrieb Denis V. Lunev:

On 26/12/14 16:32, Denis V. Lunev wrote:

On 26/12/14 16:13, Peter Lieven wrote:

Am 26.12.2014 um 13:35 schrieb Denis V. Lunev:

The check for maximum length was added by
     commit c31cb70728d2c0c8900b35a66784baa446fd5147
     Author: Peter Lieven <address@hidden>
     Date:   Thu Oct 24 12:06:58 2013 +0200
       block: honour BlockLimits in bdrv_co_do_write_zeroes

but actually if driver provides .bdrv_co_write_zeroes callback, there is
no need to limit the size to 32 MB. Callback should provide effective
implementation which normally should not write any zeroes in comparable
amount.


NACK.

First there is no guarantee that bdrv_co_do_write_zeroes is a fast operation.
This heaviliy depends on several circumstances that the block layer is not 
aware of.
If a specific protocol knows it is very fast in writing zeroes under any 
circumstance
it should provide INT_MAX in bs->bl.max_write_zeroes. It is then still allowed 
to
return -ENOTSUP if the request size or alignment doesn't fit.


the idea is that (from my point of view) if .bdrv_co_do_write_zeroes is
specified, the cost is almost the same for any amount of zeroes
written. This is true for fallocate from my point of view. The amount
of actually written data will be in several orders less than specified
except slow path, which honors 32 MB limit.

If the operation is complex in realization, then it will be rate-limited
below, in actual implementation.

There are known backends e.g. anything that deals with SCSI that have a known
limitation of the maximum number of zeroes they can write fast in a single 
request.
This number MUST NOT be exceeded. The below patch would break all those 
backends.


could you pls point me this backends. Actually, from my point of
view, they should properly setup max_write_zeroes for themselves.
This is done at least in block/iscsi.c and it would be consistent
way of doing so.


What issue are you trying to fix with this patch? Maybe there is a better way 
to fix
it at another point in the code.


I am trying to minimize amount of metadata updates for a file.
This provides some speedup even on ext4 and this will provide
even more speedup with a distributed filesystem like CEPH
where size updates of the files and block allocation are
costly.

Regards,
      Den

First of all, the patch is really wrong :) It was written using
wrong assumptions.

OK. I have spent some time reading your original patchset and
and did not found any useful justification for default limit
for both discard and write zero.


32768 is the largest power of two fitting into a uint16.
And uint16 is quite common for nb_sectors in backends.


ok. This could be reasonable.


Yes, there are drivers which requires block level to call
.bdrv_co_do_write_zeroes with alignment and with upper limit.
But in this case driver setups max_write_zeroes. All buggy
drivers should do that not to affect not buggy ones from
my opinion.

This is the only purpose of the original patches for limits.
I have wrongly interpret BlockLimits as something connected
with time of the operation. Sorry for that.

Therefore there is no good reason for limiting the amount of
data sent to drv->bdrv_co_writev with any data size. The only
thing is that it would be good not to allocate too many memory
at once. We could do something like

base = qemu_try_blockalign(bs, MIN(2048, num) * BDRV_SECTOR_SIZE);
added = 0;
for (added = 0; added < num; add += MIN(2048, num)) {
      qemu_iovec_add(qiov, base, MIN(2048, num));
}

to avoid really big allocations here even if .max_write_zeroes is
very high. Do you think that this might be useful?

As for .bdrv_co_do_write_zeroes itself, can we still drop
default 32 Mb limit? If there are some buggy drivers, they
should have .max_write_zeroes specified.

The same applies to .max_discard


Its always risky to change default behaviour. In the original discussion we
agreed that there should be a limit for each request. I think the 2^15 was
Paolos suggestion.

You where talking of metadata updates for a file. So the operation that is too 
slow
for you is bdrv_write_zeroes inside a container file? What is the underlaying 
filesystem?
What is the exact operation that you try to optimize?

I am wondering because as far as I can see write zeroes is only supported for
XFS and block devices which support BLKZEROOUT. The latter only works for
cache=none. So its not that easy to end up in an optimized (fast) path anyway.

Peter

you have missed 6 patches below ;) f.e. patch 2/7

OK. I'll redo changes and fix on raw-posix level.


I was not in CC on the series. Can you please include me in CC for all patches 
when you respin.


no prob :)

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH 2/7] block: use fallocate(FALLOC_FL_ZERO_RANGE) in handle_aiocb_write_zeroes, (continued)
- [Qemu-devel] [PATCH 2/7] block: use fallocate(FALLOC_FL_ZERO_RANGE) in handle_aiocb_write_zeroes, Denis V. Lunev, 2014/12/26
- [Qemu-devel] [PATCH 5/7] block/raw-posix: refactor handle_aiocb_write_zeroes a bit, Denis V. Lunev, 2014/12/26
- [Qemu-devel] [PATCH 7/7] block/raw-posix: call plain fallocate in handle_aiocb_write_zeroes, Denis V. Lunev, 2014/12/26
- [Qemu-devel] [PATCH 1/7] block: fix maximum length sent to bdrv_co_do_write_zeroes callback in bs, Denis V. Lunev, 2014/12/26
  - Re: [Qemu-devel] [PATCH 1/7] block: fix maximum length sent to bdrv_co_do_write_zeroes callback in bs, Peter Lieven, 2014/12/26
    - Re: [Qemu-devel] [PATCH 1/7] block: fix maximum length sent to bdrv_co_do_write_zeroes callback in bs, Denis V. Lunev, 2014/12/26
    - Re: [Qemu-devel] [PATCH 1/7] block: fix maximum length sent to bdrv_co_do_write_zeroes callback in bs, Denis V. Lunev, 2014/12/26
    - Re: [Qemu-devel] [PATCH 1/7] block: fix maximum length sent to bdrv_co_do_write_zeroes callback in bs, Peter Lieven, 2014/12/27
    - Re: [Qemu-devel] [PATCH 1/7] block: fix maximum length sent to bdrv_co_do_write_zeroes callback in bs, Denis V. Lunev, 2014/12/27
    - Re: [Qemu-devel] [PATCH 1/7] block: fix maximum length sent to bdrv_co_do_write_zeroes callback in bs, Peter Lieven, 2014/12/27
    - Re: [Qemu-devel] [PATCH 1/7] block: fix maximum length sent to bdrv_co_do_write_zeroes callback in bs, Denis V. Lunev <=
- [Qemu-devel] [PATCH 4/7] block/raw-posix: create translate_err helper to merge errno values, Denis V. Lunev, 2014/12/26
- [Qemu-devel] [PATCH 6/7] block: use fallocate(FALLOC_FL_PUNCH_HOLE) & fallocate(0) to write zeroes, Denis V. Lunev, 2014/12/26

Prev by Date: Re: [Qemu-devel] [PATCH 1/7] block: fix maximum length sent to bdrv_co_do_write_zeroes callback in bs
Next by Date: [Qemu-devel] [PATCH] block/raw-posix.c: Fixes raw_getlength() on Mac OS X so that it reports the correct length of a real CD
Previous by thread: Re: [Qemu-devel] [PATCH 1/7] block: fix maximum length sent to bdrv_co_do_write_zeroes callback in bs
Next by thread: [Qemu-devel] [PATCH 4/7] block/raw-posix: create translate_err helper to merge errno values
Index(es):
- Date
- Thread