qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 7/7] spapr_drc.c: use DRC reconfiguration to cleanup DIMM


From: David Gibson
Subject: Re: [PATCH v3 7/7] spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state
Date: Wed, 17 Feb 2021 13:31:29 +1100

On Thu, Feb 11, 2021 at 07:52:46PM -0300, Daniel Henrique Barboza wrote:
> Handling errors in memory hotunplug in the pSeries machine is more complex
> than any other device type, because there are all the complications that other
> devices has, and more.
> 
> For instance, determining a timeout for a DIMM hotunplug must consider if 
> it's a
> Hash-MMU or a Radix-MMU guest, because Hash guests takes longer to hotunplug 
> DIMMs.
> The size of the DIMM is also a factor, given that longer DIMMs naturally takes
> longer to be hotunplugged from the kernel. And there's also the guest memory 
> usage to
> be considered: if there's a process that is consuming memory that would be 
> lost by
> the DIMM unplug, the kernel will postpone the unplug process until the process
> finishes, and then initiate the regular hotunplug process. The first two
> considerations are manageable, but the last one is a deal breaker.
> 
> There is no sane way for the pSeries machine to determine the memory load in 
> the guest
> when attempting a DIMM hotunplug - and even if there was a way, the guest can 
> start
> using all the RAM in the middle of the unplug process and invalidate our 
> previous
> assumptions - and in result we can't even begin to calculate a timeout for the
> operation. This means that we can't implement a viable timeout mechanism for 
> memory
> unplug in pSeries.
> 
> Going back to why we would consider an unplug timeout, the reason is that we 
> can't
> know if the kernel is giving up the unplug. Turns out that, sometimes, we can.
> Consider a failed memory hotunplug attempt where the kernel will error out 
> with
> the following message:
> 
> 'pseries-hotplug-mem: Memory indexed-count-remove failed, adding any removed 
> LMBs'
> 
> This happens when there is a LMB that the kernel gave up in removing, and the 
> LMBs
> marked for removal of the same DIMM are now being added back. This process 
> happens

We need to be a little careful about terminology here.  From the
guest's point of view, there's no such thing as a DIMM, only LMBs.
What the guest is doing here is essentially rejecting a single "index
+ number" DRC unplug request, which corresponds to one DIMM on the
qemu side.

> in the pseries kernel in [1], dlpar_memory_remove_by_ic() into 
> dlpar_add_lmb(), and
> after that update_lmb_associativity_index(). In this function, the kernel is 
> configuring
> the LMB DRC connector again. Note that this is a valid usage in LOPAR, as 
> stated in
> section "ibm,configure-connector RTAS Call":
> 
> 'A subsequent sequence of calls to ibm,configure-connector with the same 
> entry from
> the “ibm,drc-indexes” or “ibm,drc-info” property will restart the 
> configuration of
> devices which were not completely configured.'
> 
> We can use this kernel behavior in our favor. If a DRC connector 
> reconfiguration
> for a LMB that we marked as unplug pending happens, this indicates that the 
> kernel
> changed its mind about the unplug and is reasserting that it will keep using 
> the
> DIMM. In this case, it's safe to assume that the whole DIMM unplug was 
> cancelled.
> 
> This patch hops into rtas_ibm_configure_connector() and, in the scenario 
> described
> above, clear the unplug state for the DIMM device. This will not solve all the
> problems we still have with memory unplug, but it will cover this case where 
> the
> kernel reconfigures LMBs after a failed unplug. We are a bit more resilient,
> without using an unreliable timeout, and we didn't make the remaining error 
> cases
> any worse.

I wonder if we could use this as a beginning of a hotplug failure
reporting mechanism.  As noted, this is explicitly allowed by PAPR and
I think in general it makes sense that a configure-connector would
re-assert that the guest is using the resource and we can't unplug it.

Could we extend guests to do an indicative configure-connector on any
unplug it knows it can't complete?  Or if configure-connector is too
disruptive could we use an (extra) H_SET_INDICATOR to "UNISOLATE"
state? If I'm reading right, that should be both permitted and a no-op
for existing PAPR implementations, so it should be a pretty safe way
to add that indication.

> 
> [1] arch/powerpc/platforms/pseries/hotplug-memory.c
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
>  hw/ppc/spapr.c         | 30 ++++++++++++++++++++++++++++++
>  hw/ppc/spapr_drc.c     | 14 ++++++++++++++
>  include/hw/ppc/spapr.h |  2 ++
>  3 files changed, 46 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index ecce8abf14..4bcded4a1a 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3575,6 +3575,36 @@ static SpaprDimmState 
> *spapr_recover_pending_dimm_state(SpaprMachineState *ms,
>      return spapr_pending_dimm_unplugs_add(ms, avail_lmbs, dimm);
>  }
>  
> +void spapr_clear_pending_dimm_unplug_state(SpaprMachineState *spapr,
> +                                           PCDIMMDevice *dimm)
> +{
> +    SpaprDimmState *ds = spapr_pending_dimm_unplugs_find(spapr, dimm);
> +    SpaprDrc *drc;
> +    uint32_t nr_lmbs;
> +    uint64_t size, addr_start, addr;
> +    int i;
> +
> +    if (ds) {
> +        spapr_pending_dimm_unplugs_remove(spapr, ds);
> +    }

Hrm... how would !ds arise?  Could this just be an assert?

> +
> +    size = memory_device_get_region_size(MEMORY_DEVICE(dimm), &error_abort);
> +    nr_lmbs = size / SPAPR_MEMORY_BLOCK_SIZE;
> +
> +    addr_start = object_property_get_uint(OBJECT(dimm), PC_DIMM_ADDR_PROP,
> +                                          &error_abort);
> +
> +    addr = addr_start;
> +    for (i = 0; i < nr_lmbs; i++) {
> +        drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB,
> +                              addr / SPAPR_MEMORY_BLOCK_SIZE);
> +        g_assert(drc);
> +
> +        drc->unplug_requested = false;
> +        addr += SPAPR_MEMORY_BLOCK_SIZE;
> +    }
> +}
> +
>  /* Callback to be called during DRC release. */
>  void spapr_lmb_release(DeviceState *dev)
>  {
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index c143bfb6d3..eae941233a 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -1230,6 +1230,20 @@ static void rtas_ibm_configure_connector(PowerPCCPU 
> *cpu,
>  
>      drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>  
> +    /*
> +     * This indicates that the kernel is reconfiguring a LMB due to
> +     * a failed hotunplug. Clear the pending unplug state for the whole
> +     * DIMM.
> +     */
> +    if (spapr_drc_type(drc) == SPAPR_DR_CONNECTOR_TYPE_LMB &&
> +        drc->unplug_requested) {
> +
> +        /* This really shouldn't happen in this point, but ... */
> +        g_assert(drc->dev);

I'm a little worried that a buggy or malicious guest could trigger
this assert.

> +
> +        spapr_clear_pending_dimm_unplug_state(spapr, PC_DIMM(drc->dev));
> +    }
> +
>      if (!drc->fdt) {
>          void *fdt;
>          int fdt_size;
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index ccbeeca1de..5bcc8f3bb8 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -847,6 +847,8 @@ int spapr_hpt_shift_for_ramsize(uint64_t ramsize);
>  int spapr_reallocate_hpt(SpaprMachineState *spapr, int shift, Error **errp);
>  void spapr_clear_pending_events(SpaprMachineState *spapr);
>  void spapr_clear_pending_hotplug_events(SpaprMachineState *spapr);
> +void spapr_clear_pending_dimm_unplug_state(SpaprMachineState *spapr,
> +                                           PCDIMMDevice *dimm);
>  int spapr_max_server_number(SpaprMachineState *spapr);
>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>                        uint64_t pte0, uint64_t pte1);

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]