[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] is there a limit on the number of in-flight I/O operati

From: Andrey Korolyov
Subject: Re: [Qemu-devel] is there a limit on the number of in-flight I/O operations?
Date: Thu, 14 May 2015 16:42:02 +0300

On Wed, Aug 27, 2014 at 9:43 AM, Chris Friesen
<address@hidden> wrote:
> On 08/25/2014 03:50 PM, Chris Friesen wrote:
>> I think I might have a glimmering of what's going on.  Someone please
>> correct me if I get something wrong.
>> I think that VIRTIO_PCI_QUEUE_MAX doesn't really mean anything with
>> respect to max inflight operations, and neither does virtio-blk calling
>> virtio_add_queue() with a queue size of 128.
>> I think what's happening is that virtio_blk_handle_output() spins,
>> pulling data off the 128-entry queue and calling
>> virtio_blk_handle_request().  At this point that queue entry can be
>> reused, so the queue size isn't really relevant.
>> In virtio_blk_handle_write() we add the request to a MultiReqBuffer and
>> every 32 writes we'll call virtio_submit_multiwrite() which calls down
>> into bdrv_aio_multiwrite().  That tries to merge requests and then for
>> each resulting request calls bdrv_aio_writev() which ends up calling
>> qemu_rbd_aio_writev(), which calls rbd_start_aio().
>> rbd_start_aio() allocates a buffer and converts from iovec to a single
>> buffer.  This buffer stays allocated until the request is acked, which
>> is where the bulk of the memory overhead with rbd is coming from (has
>> anyone considered adding iovec support to rbd to avoid this extra copy?).
>> The only limit I see in the whole call chain from
>> virtio_blk_handle_request() on down is the call to
>> bdrv_io_limits_intercept() in bdrv_co_do_writev().  However, that
>> doesn't provide any limit on the absolute number of inflight operations,
>> only on operations/sec.  If the ceph server cluster can't keep up with
>> the aggregate load, then the number of inflight operations can still
>> grow indefinitely.
>> Chris
> I was a bit concerned that I'd need to extend the IO throttling code to
> support a limit on total inflight bytes, but it doesn't look like that will
> be necessary.
> It seems that using mallopt() to set the trim/mmap thresholds to 128K is
> enough to minimize the increase in RSS and also drop it back down after an
> I/O burst.  For now this looks like it should be sufficient for our
> purposes.
> I'm actually a bit surprised I didn't have to go lower, but it seems to work
> for both "dd" and dbench testcases so we'll give it a try.
> Chris

Bumping this...

For now, we are rarely suffering with an unlimited cache growth issue
which can be observed on all post-1.4 versions of qemu with rbd
backend in a writeback mode and certain pattern of a guest operations.
The issue is confirmed for virtio and can be re-triggered by issuing
excessive amount of write requests without completing returned acks
from a emulator` cache timely. Since most applications behave in a
right way, the oom issue is very rare (and we developed an ugly
workaround for such situations long ago). If anybody is interested in
fixing this, I can send a prepared image for a reproduction or
instructions to make one, whichever is preferable.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]