[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [qemu-s390x] [RFC v3 0/56] per-CPU locks

From: Emilio G. Cota
Subject: Re: [qemu-s390x] [RFC v3 0/56] per-CPU locks
Date: Fri, 19 Oct 2018 15:29:32 -0400
User-agent: Mutt/1.9.4 (2018-02-28)

On Fri, Oct 19, 2018 at 18:01:18 +0200, Paolo Bonzini wrote:
> On 19/10/2018 16:50, Emilio G. Cota wrote:
> > On Fri, Oct 19, 2018 at 08:59:24 +0200, Paolo Bonzini wrote:
> >> On 19/10/2018 03:05, Emilio G. Cota wrote:
> >>> I'm calling this series a v3 because it supersedes the two series
> >>> I previously sent about using atomics for interrupt_request:
> >>>   https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg02013.html
> >>> The approach in that series cannot work reliably; using (locked) atomics
> >>> to set interrupt_request but not using (locked) atomics to read it
> >>> can lead to missed updates.
> >>
> >> The idea here was that changes to protected fields are all followed by
> >> kick.  That may not have been the case, granted, but I wonder if the
> >> plan is unworkable.
> > 
> > I suspect that the cpu->interrupt_request+kick mechanism is not the issue,
> > otherwise master should not work--we do atomic_read(cpu->interrupt_request)
> > and only if that read != 0 we take the BQL.
> > 
> > My guess is that the problem is with other reads of cpu->interrupt_request,
> > e.g. those in cpu_has_work. Currently those reads happen with the
> > BQL held, and updates to cpu->interrupt_request take the BQL. If we drop
> > the BQL from the setters to instead use locked atomics (like in the
> > aforementioned series), those BQL-protected readers might miss updates.
> cpu_has_work is only needed to handle the processor's halted state (or
> is it?).  If it is, OR+kick should work.
> > Given that we need a per-CPU lock anyway to remove the BQL from the
> > CPU loop, extending this lock to protect cpu->interrupt_request is
> > a simple solution that keeps the current logic and allows for
> > greater scalability.
> Sure, I was just curious what the problem was.  KVM uses OR+kick with no
> problems.

I never found exactly where things break. The hangs happen
pretty early when booting a large (-smp > 16) x86_64 Ubuntu guest.
Booting never completes (ssh unresponsive) if I don't have the
console output (I suspect the console output slows things down
enough to hide some races). I only see a few threads busy:
a couple of vCPU threads, and the I/O thread.

I didn't have time to debug any further, so I moved on
to an alternative approach.

So it is possible that it was my implementation, and not the approach,
what was at fault :-)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]