[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [Qemu-devel] qemu-system-ppc hangs
From: |
Richard Purdie |
Subject: |
Re: [Qemu-ppc] [Qemu-devel] qemu-system-ppc hangs |
Date: |
Tue, 21 Nov 2017 17:33:37 +0000 |
On Tue, 2017-11-21 at 09:55 +0000, Alex Bennée wrote:
> Richard Purdie <address@hidden> writes:
> > At this point I therefore wanted to seek advice on what the real
> > issue
> > is here and how to fix it!
> Code should be using cpu_interrupt() to change things but we have
> plenty
> of examples in the code of messing with cpu->interrupt_request
> directly
> which is often why the assert() in cpu_interrupt() doesn't get a
> chance
> to fire.
>
> The two most prolific direct users seems to be i386 and ppc.
>
> The i386 cases are all most likely OK as it tends to be in KVM
> emulation
> code where by definition the BQL is already held by the time you get
> there. For PPC it will depend on how you got there. The
> multi-thread-tcg.txt document describes the approach:
>
> Emulated hardware state
> -----------------------
>
> Currently thanks to KVM work any access to IO memory is
> automatically
> protected by the global iothread mutex, also known as the BQL (Big
> Qemu Lock). Any IO region that doesn't use global mutex is expected
> to
> do its own locking.
>
> However IO memory isn't the only way emulated hardware state can be
> modified. Some architectures have model specific registers that
> trigger hardware emulation features. Generally any translation
> helper
> that needs to update more than a single vCPUs of state should take
> the
> BQL.
>
> As the BQL, or global iothread mutex is shared across the system we
> push the use of the lock as far down into the TCG code as possible
> to
> minimise contention.
>
> (Current solution)
>
> MMIO access automatically serialises hardware emulation by way of
> the
> BQL. Currently ARM targets serialise all ARM_CP_IO register
> accesses
> and also defer the reset/startup of vCPUs to the vCPU context by
> way
> of async_run_on_cpu().
>
> Updates to interrupt state are also protected by the BQL as they
> can
> often be cross vCPU.
>
> So basically it comes down to the call-path into your final
> cpu_interrupt() call which is why I guess your doing the:
>
> if (!qemu_mutex_iothread_locked()) {
> qemu_mutex_lock_iothread();
> cpu_interrupt(cs, CPU_INTERRUPT_EXITTB);
> qemu_mutex_unlock_iothread();
> }
>
> dance. I suspect the helper functions are called both from device
> emulation (where BQL is held) and from vCPU helpers (where it is
> currently not).
>
> So I suggest the fix is:
>
> 1. Convert sites doing their own manipulation of
> cpu->interrupt_request() to call the helper instead
> 2. If helper directly called from TCG code:
> - take BQL before calling cpu_interrupt(), release after
> Else if helper shared between MMIO/TCG paths
> - take BQL from TCG path, call helper, release after
>
> It might be there is some sensible re-factoring that could be done to
> make things clearer but I'll leave that to the PPC experts to weigh
> in
> on.
>
> Hope that helps!
It does indeed, thanks. I've sent out a proposed patch which does the
above so people can review it and see if its the right thing to do.
Certainly its working fine locally.
Cheers,
Richard