[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] throttle-groups: fix hang when group member lea
Re: [Qemu-devel] [PATCH] throttle-groups: fix hang when group member leaves
Tue, 31 Jul 2018 18:47:53 +0200
Notmuch/0.18.2 (http://notmuchmail.org) Emacs/24.4.1 (i586-pc-linux-gnu)
On Wed 04 Jul 2018 04:54:10 PM CEST, Stefan Hajnoczi wrote:
> Throttle groups consist of members sharing one throttling state
> (including bps/iops limits). Round-robin scheduling is used to ensure
> fairness. If a group member already has a timer pending then other
> groups members do not schedule their own timers. The next group
> member will have its turn when the existing timer expires.
> A hang may occur when a group member leaves while it had a timer
Ok, I can reproduce this if I run fio with iodepth=1.
We're draining the BDS before removing it from a throttle group, and
therefore there cannot be any pending requests.
So the problem seems to be that when throttle_co_drain_begin() runs the
pending requests from a member using throttle_group_co_restart_queue(),
it simply uses qemu_co_queue_next() and doesn't touch the timer at all.
So it can happen that there's a request in the queue waiting for a
timer, and after that call the request is gone but the timer remains.
The current patch is perhaps not worth touching at this point (we're
about to release QEMU 3.0), but I think that a better solution would be
a) cancel the existing timer and reset tg->any_timer_armed on the given
tgm after throttle_group_co_restart_queue() and before
schedule_next_request() if the queue is empty.
b) force the existing timer to run immediately instead of calling
throttle_group_co_restart_queue(). Seems cleaner, but I haven't tried
this one yet.
I'll explore them a bit and send a patch.