[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 0/3] ppc: Fix 'info pic' crash
From: |
David Gibson |
Subject: |
Re: [PATCH 0/3] ppc: Fix 'info pic' crash |
Date: |
Sun, 27 Oct 2019 18:10:42 +0100 |
User-agent: |
Mutt/1.12.1 (2019-06-15) |
On Thu, Oct 24, 2019 at 04:27:16PM +0200, Greg Kurz wrote:
> The interrupt presenters are currently parented to their associated
> VCPU, and we rely on CPU_FOREACH() when we need to perform a specific
> task with them. Like exposing their state with 'info pic', or finding
> the target VCPU for an interrupt when using the XIVE controller.
>
> We recently realized that the latter could crash QEMU because CPU_FOREACH()
> can race with CPU hotplug. This got fixed by checking the presenter pointer
> under the CPU was set (commit 627fa61746f7), but I'm not that sure that
> this is enough since the presenter pointers also get stale at some point
> during CPU unplug. And we still have other users of CPU_FOREACH(), namely
> 'info pic' with both XICS and XIVE, that have the very same problem:
>
> With XIVE:
>
> Thread 1 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault.
> 0x00000001003d2848 in xive_tctx_pic_print_info (tctx=0x101ae5280,
> mon=0x7fffffffe180) at /home/greg/Work/qemu/qemu-spapr/hw/intc/xive.c:526
> 526 int cpu_index = tctx->cs ? tctx->cs->cpu_index : -1;
> (gdb) p tctx
> $1 = (XiveTCTX *) 0x101ae5280
> (gdb) p tctx->cs
> $2 = (CPUState *) 0x2057512020203a5d <-- tctx is stale
> (gdb) p tctx->cs->cpu_index
> Cannot access memory at address 0x205751202020bead
>
> With XICS:
>
> Thread 1 "qemu-system-ppc" received signal SIGSEGV, Segmentation fault.
> 0x00000001003cc39c in icp_pic_print_info (icp=0x10244ccf0, mon=0x7fffffffe940)
> at /home/greg/Work/qemu/qemu-spapr/hw/intc/xics.c:47
> 47 int cpu_index = icp->cs ? icp->cs->cpu_index : -1;
> (gdb) p icp
> $1 = (ICPState *) 0x10244ccf0
> (gdb) p icp->cs
> $2 = (CPUState *) 0x524958203220 <-- icp is stale
> (gdb) p icp->cs->cpu_index
> Cannot access memory at address 0x52495820b670
>
> It may be worth finding a way to address this globally instead of
> open-coding the check of the presenter pointer everywhere because
> this is fragile. I gave a try with this series:
>
> [0/6] ppc: Reparent the interrupt presenter
>
> https://patchwork.ozlabs.org/cover/1182224/
>
> but it requires some more reflexion. Also, we're about to enter
> softfreeze, and it seems better to come up with a simpler fix.
>
> Let's forget the reparenting and check the presenter pointers
> where needed instead. Patch 1 from the previous series was changed
> to also NULLify presenter pointers, so that they can be used to
> filter out unwanted vCPUs in patch 3. I've kept patch 2 because
> it's a fix in the same area, but it isn't related to the QEMU
> crashes.
Applied to ppc-for-4.2, thanks.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature