[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3 0/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icou
|
From: |
Jamie Iles |
|
Subject: |
Re: [PATCH v3 0/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount |
|
Date: |
Wed, 3 May 2023 10:44:57 +0100 |
Hi Richard,
On Sat, Apr 29, 2023 at 10:28:03AM +0100, Richard Henderson wrote:
> On 4/27/23 03:09, Jamie Iles wrote:
> > From: Jamie Iles <jiles@qti.qualcomm.com>
> >
> > The round-robin scheduler will iterate over the CPU list with an
> > assigned budget until the next timer expiry and may exit early because
> > of a TB exit. This is fine under normal operation but with icount
> > enabled and SMP it is possible for a CPU to be starved of run time and
> > the system live-locks.
> >
> > For example, booting a riscv64 platform with '-icount
> > shift=0,align=off,sleep=on -smp 2' we observe a livelock once the kernel
> > has timers enabled and starts performing TLB shootdowns. In this case
> > we have CPU 0 in M-mode with interrupts disabled sending an IPI to CPU
> > 1. As we enter the TCG loop, we assign the icount budget to next timer
> > interrupt to CPU 0 and begin executing where the guest is sat in a busy
> > loop exhausting all of the budget before we try to execute CPU 1 which
> > is the target of the IPI but CPU 1 is left with no budget with which to
> > execute and the process repeats.
> >
> > We try here to add some fairness by splitting the budget across all of
> > the CPUs on the thread fairly before entering each one. The CPU count
> > is cached on CPU list generation ID to avoid iterating the list on each
> > loop iteration. With this change it is possible to boot an SMP rv64
> > guest with icount enabled and no hangs.
> >
> > New in v3 (address feedback from Richard Henderson):
> > - Additional patch to use QEMU_LOCK_GUARD with qemu_cpu_list_lock where
> > appropriate
> > - Move rr_cpu_count() call to be conditional on icount_enabled()
> > - Initialize cpu_budget to 0
> >
> > Jamie Iles (2):
> > cpu: expose qemu_cpu_list_lock for lock-guard use
> > accel/tcg/tcg-accel-ops-rr: ensure fairness with icount
>
> It appears as if one of these two patches causes a failure in replay, e.g.
>
> https://gitlab.com/rth7680/qemu/-/jobs/4200609234#L4162
>
> Would you have a look, please?
I was out on vacation and it looks like this job got cleaned up, but was
this a mutex check failing for the replay mutex and/or iothread mutex?
If so then the following patch fixes it for me which I can include in a
v4
Thanks,
Jamie
diff --git i/accel/tcg/tcg-accel-ops-icount.c w/accel/tcg/tcg-accel-ops-icount.c
index e1e8afaf2f99..3d2cfbbc9778 100644
--- i/accel/tcg/tcg-accel-ops-icount.c
+++ w/accel/tcg/tcg-accel-ops-icount.c
@@ -114,13 +114,13 @@ void icount_prepare_for_run(CPUState *cpu, int64_t
cpu_budget)
g_assert(cpu_neg(cpu)->icount_decr.u16.low == 0);
g_assert(cpu->icount_extra == 0);
+ replay_mutex_lock();
+
cpu->icount_budget = MIN(icount_get_limit(), cpu_budget);
insns_left = MIN(0xffff, cpu->icount_budget);
cpu_neg(cpu)->icount_decr.u16.low = insns_left;
cpu->icount_extra = cpu->icount_budget - insns_left;
- replay_mutex_lock();
-
if (cpu->icount_budget == 0) {
/*
* We're called without the iothread lock, so must take it while
diff --git i/replay/replay.c w/replay/replay.c
index c39156c52221..0f7d766efe81 100644
--- i/replay/replay.c
+++ w/replay/replay.c
@@ -74,7 +74,7 @@ uint64_t replay_get_current_icount(void)
int replay_get_instructions(void)
{
int res = 0;
- replay_mutex_lock();
+ g_assert(replay_mutex_locked());
if (replay_next_event_is(EVENT_INSTRUCTION)) {
res = replay_state.instruction_count;
if (replay_break_icount != -1LL) {
@@ -85,7 +85,6 @@ int replay_get_instructions(void)
}
}
}
- replay_mutex_unlock();
return res;
}
- Re: [PATCH v3 0/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount,
Jamie Iles <=