qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3] spapr: Ensure all LMBs are represented in ib


From: Bharata B Rao
Subject: Re: [Qemu-devel] [PATCH v3] spapr: Ensure all LMBs are represented in ibm, dynamic-memory
Date: Wed, 8 Jun 2016 08:05:54 +0530
User-agent: Mutt/1.5.23 (2014-03-12)

On Tue, Jun 07, 2016 at 06:37:28PM -0500, Michael Roth wrote:
> Quoting Bharata B Rao (2016-06-07 00:19:03)
> > Memory hotplug can fail for some combinations of RAM and maxmem when
> > DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends
> > on maximum addressable memory returned by guest and this value is currently
> > being calculated wrongly by the guest kernel routine memory_hotplug_max().
> > While there is an attempt to fix the guest kernel, this patch works
> > around the problem within QEMU itself.
> > 
> > memory_hotplug_max() routine in the guest kernel arrives at max
> > addressable memory by multiplying lmb-size with the lmb-count obtained
> > from ibm,dynamic-memory property. There are two assumptions here:
> > 
> > - All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM
> >   where only hot-pluggable LMBs are present in this property.
> > - The memory area comprising of RAM and hotplug region is contiguous: This
> >   needn't be true always for PowerKVM as there can be gap between
> >   boot time RAM and hotplug region.
> > 
> > To work around this guest kernel bug, ensure that ibm,dynamic-memory
> > has information about all the LMBs (RMA, boot-time LMBs, future
> > hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and
> > hotpluggable region).
> > 
> > RMA is represented separately by address@hidden node. Hence mark RMA LMBs
> > and also the LMBs for the gap b/n RAM and hotpluggable region as
> > reserved so that these LMBs are not recounted/counted by guest.
> > 
> > Signed-off-by: Bharata B Rao <address@hidden>
> > ---
> > Changes in v3:
> > 
> > - Not touching spapr_create_lmb_dr_connectors() so that we continue
> >   to have DRC objects for only hotpluggable LMBs.
> > - Simplified the logic of creating dynamic-memory node based on comments
> >   from Michael Roth and David Gibson.
> > 
> > v2: https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg01316.html
> > 
> >  hw/ppc/spapr.c         | 51 
> > ++++++++++++++++++++++++++++++++------------------
> >  include/hw/ppc/spapr.h |  5 +++--
> >  2 files changed, 36 insertions(+), 20 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 0636642..9d1d43d 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -762,14 +762,17 @@ static int 
> > spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
> >      int ret, i, offset;
> >      uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
> >      uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)};
> > -    uint32_t nr_lmbs = (machine->maxram_size - machine->ram_size)/lmb_size;
> > +    uint32_t hotplug_lmb_start = spapr->hotplug_memory.base / lmb_size;
> > +    uint32_t nr_lmbs = (spapr->hotplug_memory.base +
> > +                       memory_region_size(&spapr->hotplug_memory.mr)) /
> > +                       lmb_size;
> >      uint32_t *int_buf, *cur_index, buf_len;
> >      int nr_nodes = nb_numa_nodes ? nb_numa_nodes : 1;
> > 
> >      /*
> > -     * Don't create the node if there are no DR LMBs.
> > +     * Don't create the node if there is no hotpluggable memory
> >       */
> > -    if (!nr_lmbs) {
> > +    if (machine->ram_size == machine->maxram_size) {
> >          return 0;
> >      }
> > 
> > @@ -805,24 +808,36 @@ static int 
> > spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt)
> >      for (i = 0; i < nr_lmbs; i++) {
> >          sPAPRDRConnector *drc;
> >          sPAPRDRConnectorClass *drck;
> 
> Since these ^ are only used if (i >= hotplug_lmb_start), it might be
> clearer to move them there now.

Yes.

> 
> > -        uint64_t addr = i * lmb_size + spapr->hotplug_memory.base;;
> > +        uint64_t addr = i * lmb_size;
> >          uint32_t *dynamic_memory = cur_index;
> > 
> > -        drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > -                                       addr/lmb_size);
> > -        g_assert(drc);
> > -        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > -
> > -        dynamic_memory[0] = cpu_to_be32(addr >> 32);
> > -        dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
> > -        dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> > -        dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > -        dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> > -        if (addr < machine->ram_size ||
> > -                    memory_region_present(get_system_memory(), addr)) {
> > -            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> > +        if (i >= hotplug_lmb_start) {
> > +            drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > +                                           addr / lmb_size);
> 
> Could just be i

Hmm I thought I got all such occurances covered :(

> 
> > +            g_assert(drc);
> > +            drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +
> > +            dynamic_memory[0] = cpu_to_be32(addr >> 32);
> > +            dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
> > +            dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
> > +            dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > +            dynamic_memory[4] = cpu_to_be32(numa_get_node(addr, NULL));
> > +            if (memory_region_present(get_system_memory(), addr)) {
> > +                dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
> > +            } else {
> > +                dynamic_memory[5] = cpu_to_be32(0);
> > +            }
> >          } else {
> > -            dynamic_memory[5] = cpu_to_be32(0);
> > +            /*
> > +             * LMB information for RMA, boot time RAM and gap b/n RAM and
> > +             * hotplug memory region -- all these are marked as reserved.
> > +             */
> > +            dynamic_memory[0] = cpu_to_be32(0);
> > +            dynamic_memory[1] = cpu_to_be32(0);
> 
> Are we sure we shouldn't still encode the addr here?

Since kernel won't look at reserved LMBs, I thought it should be fine.
We could populate the addr, but we don't have DRC for them and hence
no DRC index, so anyway we will not have full information.

> 
> > +            dynamic_memory[2] = cpu_to_be32(0);
> > +            dynamic_memory[3] = cpu_to_be32(0); /* reserved */
> > +            dynamic_memory[4] = cpu_to_be32(-1);
> > +            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_RESERVED);
> 
> LoPAPR Table 248 defines a "DRC invalid" bit at 0x00000020, which I
> think is what we'll want for cases where there's no backing DRC.

Seems to work. I started with reserved based on Nathan's original
suggestion.

So Nathan, which would be more appropriate here from kernel point of view ?
Reserved or 'DRC Invalid' ?

Regards,
Bharata.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]