qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/7] block: fix maximum length sent to bdrv_co_d


From: Peter Lieven
Subject: Re: [Qemu-devel] [PATCH 1/7] block: fix maximum length sent to bdrv_co_do_write_zeroes callback in bs
Date: Sat, 27 Dec 2014 15:52:22 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0

Am 26.12.2014 um 20:15 schrieb Denis V. Lunev:
> On 26/12/14 16:32, Denis V. Lunev wrote:
>> On 26/12/14 16:13, Peter Lieven wrote:
>>> Am 26.12.2014 um 13:35 schrieb Denis V. Lunev:
>>>> The check for maximum length was added by
>>>>    commit c31cb70728d2c0c8900b35a66784baa446fd5147
>>>>    Author: Peter Lieven <address@hidden>
>>>>    Date:   Thu Oct 24 12:06:58 2013 +0200
>>>>      block: honour BlockLimits in bdrv_co_do_write_zeroes
>>>>
>>>> but actually if driver provides .bdrv_co_write_zeroes callback, there is
>>>> no need to limit the size to 32 MB. Callback should provide effective
>>>> implementation which normally should not write any zeroes in comparable
>>>> amount.
>>>
>>> NACK.
>>>
>>> First there is no guarantee that bdrv_co_do_write_zeroes is a fast 
>>> operation.
>>> This heaviliy depends on several circumstances that the block layer is not 
>>> aware of.
>>> If a specific protocol knows it is very fast in writing zeroes under any 
>>> circumstance
>>> it should provide INT_MAX in bs->bl.max_write_zeroes. It is then still 
>>> allowed to
>>> return -ENOTSUP if the request size or alignment doesn't fit.
>>
>> the idea is that (from my point of view) if .bdrv_co_do_write_zeroes is
>> specified, the cost is almost the same for any amount of zeroes
>> written. This is true for fallocate from my point of view. The amount
>> of actually written data will be in several orders less than specified
>> except slow path, which honors 32 MB limit.
>>
>> If the operation is complex in realization, then it will be rate-limited
>> below, in actual implementation.
>>
>>> There are known backends e.g. anything that deals with SCSI that have a 
>>> known
>>> limitation of the maximum number of zeroes they can write fast in a single 
>>> request.
>>> This number MUST NOT be exceeded. The below patch would break all those 
>>> backends.
>>
>> could you pls point me this backends. Actually, from my point of
>> view, they should properly setup max_write_zeroes for themselves.
>> This is done at least in block/iscsi.c and it would be consistent
>> way of doing so.
>>
>>>
>>> What issue are you trying to fix with this patch? Maybe there is a better 
>>> way to fix
>>> it at another point in the code.
>>>
>>
>> I am trying to minimize amount of metadata updates for a file.
>> This provides some speedup even on ext4 and this will provide
>> even more speedup with a distributed filesystem like CEPH
>> where size updates of the files and block allocation are
>> costly.
>>
>> Regards,
>>     Den
> First of all, the patch is really wrong :) It was written using
> wrong assumptions.
>
> OK. I have spent some time reading your original patchset and
> and did not found any useful justification for default limit
> for both discard and write zero.

32768 is the largest power of two fitting into a uint16.
And uint16 is quite common for nb_sectors in backends.

>
> Yes, there are drivers which requires block level to call
> .bdrv_co_do_write_zeroes with alignment and with upper limit.
> But in this case driver setups max_write_zeroes. All buggy
> drivers should do that not to affect not buggy ones from
> my opinion.
>
> This is the only purpose of the original patches for limits.
> I have wrongly interpret BlockLimits as something connected
> with time of the operation. Sorry for that.
>
> Therefore there is no good reason for limiting the amount of
> data sent to drv->bdrv_co_writev with any data size. The only
> thing is that it would be good not to allocate too many memory
> at once. We could do something like
>
> base = qemu_try_blockalign(bs, MIN(2048, num) * BDRV_SECTOR_SIZE);
> added = 0;
> for (added = 0; added < num; add += MIN(2048, num)) {
>     qemu_iovec_add(qiov, base, MIN(2048, num));
> }
>
> to avoid really big allocations here even if .max_write_zeroes is
> very high. Do you think that this might be useful?
>
> As for .bdrv_co_do_write_zeroes itself, can we still drop
> default 32 Mb limit? If there are some buggy drivers, they
> should have .max_write_zeroes specified.
>
> The same applies to .max_discard

Its always risky to change default behaviour. In the original discussion we
agreed that there should be a limit for each request. I think the 2^15 was
Paolos suggestion.

You where talking of metadata updates for a file. So the operation that is too 
slow
for you is bdrv_write_zeroes inside a container file? What is the underlaying 
filesystem?
What is the exact operation that you try to optimize?

I am wondering because as far as I can see write zeroes is only supported for
XFS and block devices which support BLKZEROOUT. The latter only works for
cache=none. So its not that easy to end up in an optimized (fast) path anyway.

Peter




reply via email to

[Prev in Thread] Current Thread [Next in Thread]