[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] is there a limit on the number of in-flight I/O operati
Re: [Qemu-devel] is there a limit on the number of in-flight I/O operations?
Thu, 14 May 2015 16:42:02 +0300
On Wed, Aug 27, 2014 at 9:43 AM, Chris Friesen
> On 08/25/2014 03:50 PM, Chris Friesen wrote:
>> I think I might have a glimmering of what's going on. Someone please
>> correct me if I get something wrong.
>> I think that VIRTIO_PCI_QUEUE_MAX doesn't really mean anything with
>> respect to max inflight operations, and neither does virtio-blk calling
>> virtio_add_queue() with a queue size of 128.
>> I think what's happening is that virtio_blk_handle_output() spins,
>> pulling data off the 128-entry queue and calling
>> virtio_blk_handle_request(). At this point that queue entry can be
>> reused, so the queue size isn't really relevant.
>> In virtio_blk_handle_write() we add the request to a MultiReqBuffer and
>> every 32 writes we'll call virtio_submit_multiwrite() which calls down
>> into bdrv_aio_multiwrite(). That tries to merge requests and then for
>> each resulting request calls bdrv_aio_writev() which ends up calling
>> qemu_rbd_aio_writev(), which calls rbd_start_aio().
>> rbd_start_aio() allocates a buffer and converts from iovec to a single
>> buffer. This buffer stays allocated until the request is acked, which
>> is where the bulk of the memory overhead with rbd is coming from (has
>> anyone considered adding iovec support to rbd to avoid this extra copy?).
>> The only limit I see in the whole call chain from
>> virtio_blk_handle_request() on down is the call to
>> bdrv_io_limits_intercept() in bdrv_co_do_writev(). However, that
>> doesn't provide any limit on the absolute number of inflight operations,
>> only on operations/sec. If the ceph server cluster can't keep up with
>> the aggregate load, then the number of inflight operations can still
>> grow indefinitely.
> I was a bit concerned that I'd need to extend the IO throttling code to
> support a limit on total inflight bytes, but it doesn't look like that will
> be necessary.
> It seems that using mallopt() to set the trim/mmap thresholds to 128K is
> enough to minimize the increase in RSS and also drop it back down after an
> I/O burst. For now this looks like it should be sufficient for our
> I'm actually a bit surprised I didn't have to go lower, but it seems to work
> for both "dd" and dbench testcases so we'll give it a try.
For now, we are rarely suffering with an unlimited cache growth issue
which can be observed on all post-1.4 versions of qemu with rbd
backend in a writeback mode and certain pattern of a guest operations.
The issue is confirmed for virtio and can be re-triggered by issuing
excessive amount of write requests without completing returned acks
from a emulator` cache timely. Since most applications behave in a
right way, the oom issue is very rare (and we developed an ugly
workaround for such situations long ago). If anybody is interested in
fixing this, I can send a prepared image for a reproduction or
instructions to make one, whichever is preferable.
|[Prev in Thread]
||[Next in Thread]|
- Re: [Qemu-devel] is there a limit on the number of in-flight I/O operations?,
Andrey Korolyov <=