qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 6/6] spapr: Model DR connectors as simple objects


From: Greg Kurz
Subject: Re: [PATCH 6/6] spapr: Model DR connectors as simple objects
Date: Wed, 6 Jan 2021 19:15:36 +0100

On Mon, 28 Dec 2020 19:28:39 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Fri, Dec 18, 2020 at 11:34:00AM +0100, Greg Kurz wrote:
> > Modeling DR connectors as individual devices raises some
> > concerns, as already discussed a year ago in this thread:
> > 
> > https://patchew.org/QEMU/20191017205953.13122-1-cheloha@linux.vnet.ibm.com/
> > 
> > First, high maxmem settings creates too many DRC devices.
> > This causes scalability issues. It severely increase boot
> > time because the multiple traversals of the DRC list that
> > are performed during machine setup are quadratic operations.
> > This is directly related to the fact that DRCs are modeled
> > as individual devices and added to the composition tree.
> > 
> > Second, DR connectors are really an internal concept of
> > PAPR. They aren't something that the user or management
> > layer can manipulate in any way. We already don't allow
> > their creation with device_add by clearing user_creatable.
> > 
> > DR connectors don't even need to be modeled as actual
> > devices since they don't sit in a bus. They just need
> > to be associated to an 'owner' object and to have the
> > equivalent of realize/unrealize functions.
> > 
> > Downgrade them to be simple objects. Convert the existing
> > realize() and unrealize() to be methods of the DR connector
> > base class. Also have the base class to inherit from the
> > vmstate_if interface directly. The get_id() hook simply
> > returns NULL, just as device_vmstate_if_get_id() does for
> > devices that don't sit in a bus. The DR connector is no
> > longer made a child object. This means that it must be
> > explicitely freed when no longer needed. This is only
> > required for PHBs and PCI bridges actually : have them to
> > free the DRC with spapr_dr_connector_free() instead of
> > object_unparent().
> > 
> > No longer add the DRCs to the QOM composition tree. Track
> > them with a glib hash table using the global DRC index as
> > the key instead. This makes traversal a linear operation.
> 
> I have some reservations about this one.  The main thing is that
> attaching migration state to something that's not a device seems a bit
> odd to me.  AFAICT exactly one other non-device implements
> TYPE_VMSTATE_IF, and what it does isn't very clear to me.
> 

Even with your proposal below, the current SpaprDrc type, which is
used all over the place, will stop being a TYPE_DEVICE but we still
need to support migration with existing machine types for which DRC
are devices. Implementing TYPE_VMSTATE_IF is essentially a hack that
allows to do that without keeping the current TYPE_DEVICE based
implementation around.

> As I might have mentioned to you I had a different idea for how to
> address this problem: still use a TYPE_DEVICE, but have it manage a
> whole array of DRCs as one unit, rather than just a single one.
> Specifically I was thinking:
> 
> * one array per PCI bus (DRCs for each function on the bus)
> * one array for each block of memory (so one for base memory, one for
>   each DIMM)
> * one array for all the cpus
> * one array for all the PHBs
> 
> It has some disadvantages compared to your scheme: it still leaves
> (less) devices which can't be user managed, which is a bit ugly.  On
> the other hand, each of those arrays can reasonably be dense, so we
> can use direct indexing rather than a hash table, which is a bit
> nicer.
> 
> Thoughts?
> 

I find it a bit overkill to introduce a new TYPE_DEVICE (let's
call it a DRC manager) for something that:
- doesn't sit on a bus
- can't be user managed
- isn't directly represented to the guest as a full node
  in the DT unlike all other devices, but just as indexes
  in some properties of actual DR capable devices.

Given that the DRC index space is global and this is what
the guest passes to DR RTAS calls, we can't do direct
indexing, strictly speaking. We need at least some logic
to dispatch operations on individual DRC states to the
appropriate DRC manager. This logic belongs to the machine
IMHO.

This shouldn't be too complex for CPUs and PHBs since they
sit directly under the machine and have 1:1 relation with
the attached device. It just boils down to instantiate
some DRC managers during machine init:

- range [ 0x10000000 ... 0x10000000 + ${max_cpus} [
  for CPUs
- range [ 0x20000000 ... 0x20000000 + 31 [
  for PHBs

For memory, the code currently generates DRC indexes in the range:

[ 0x80000000 ... 0x80000000 + ${base_ram_size}/256M ... ${max_ram_size}/256M [

ie. it doesn't generate DRC indexes for the base memory AFAICT. Also
each DIMM can be of arbitrary size, ie. consume an arbitrary amount
of DRC indexes. So the machine would instantiate SPAPR_MAX_RAM_SLOTS (32)
DRC managers, each capable of managing the full set of LMB DRCs, just
in case ? Likely a lot of zeroes with high maxmem settings but I guess
we can live with it.

PCI busses would need some extra care though since the machine
doesn't know about them. This would require to be able to
register/unregister DRC managers for SPAPR_DR_CONNECTOR_TYPE_PCI
indexes, so that the dispatching logic know about the ranges
they cover (PHB internals).

And finally comes migration : I cannot think of a way to generate
the VMState sections used by existing machine types out of a set
of arrays of integers... We could keep the current implementation
around and use it with older machine types, but this certainly
looks terrible from a maintenance perspective. Did you have any
suggestion to handle that ?

I seem to remember that one of the motivation to have arrays
of DRCs is to avoid the inflation of VMState sections that
we currently get with high maxmem settings, and it is considered
preferable to stream sparse arrays. This could be achieved by
building these arrays out of the global DRC hash table in a machine
pre-save handler and migrate them in a subsection for the default
machine type. Older machine types would continue with the current
VMState sections thanks to the TYPE_VMSTATE_IF hack.

Does this seem a reasonable trade-off to be able to support
older and newer machine types with the same implementation ?

> > 
> > Signed-off-by: Greg Kurz <groug@kaod.org>
> > ---
> >  include/hw/ppc/spapr_drc.h |   8 +-
> >  hw/ppc/spapr_drc.c         | 166 ++++++++++++++-----------------------
> >  hw/ppc/spapr_pci.c         |   2 +-
> >  3 files changed, 69 insertions(+), 107 deletions(-)
> > 
> > diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> > index 8982927d5c24..a26aa8b9d4c3 100644
> > --- a/include/hw/ppc/spapr_drc.h
> > +++ b/include/hw/ppc/spapr_drc.h
> > @@ -170,7 +170,7 @@ typedef enum {
> >  
> >  typedef struct SpaprDrc {
> >      /*< private >*/
> > -    DeviceState parent;
> > +    Object parent;
> >  
> >      uint32_t id;
> >      Object *owner;
> > @@ -193,7 +193,7 @@ struct SpaprMachineState;
> >  
> >  typedef struct SpaprDrcClass {
> >      /*< private >*/
> > -    DeviceClass parent;
> > +    ObjectClass parent;
> >      SpaprDrcState empty_state;
> >      SpaprDrcState ready_state;
> >  
> > @@ -209,6 +209,9 @@ typedef struct SpaprDrcClass {
> >  
> >      int (*dt_populate)(SpaprDrc *drc, struct SpaprMachineState *spapr,
> >                         void *fdt, int *fdt_start_offset, Error **errp);
> > +
> > +    void (*realize)(SpaprDrc *drc);
> > +    void (*unrealize)(SpaprDrc *drc);
> >  } SpaprDrcClass;
> >  
> >  typedef struct SpaprDrcPhysical {
> > @@ -232,6 +235,7 @@ SpaprDrcType spapr_drc_type(SpaprDrc *drc);
> >  
> >  SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
> >                                           uint32_t id);
> > +void spapr_dr_connector_free(SpaprDrc *drc);
> >  SpaprDrc *spapr_drc_by_index(uint32_t index);
> >  SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id);
> >  int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t 
> > drc_type_mask);
> > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> > index 8571d5bafe4e..e26763f8b5a4 100644
> > --- a/hw/ppc/spapr_drc.c
> > +++ b/hw/ppc/spapr_drc.c
> > @@ -27,7 +27,6 @@
> >  #include "sysemu/reset.h"
> >  #include "trace.h"
> >  
> > -#define DRC_CONTAINER_PATH "/dr-connector"
> >  #define DRC_INDEX_TYPE_SHIFT 28
> >  #define DRC_INDEX_ID_MASK ((1ULL << DRC_INDEX_TYPE_SHIFT) - 1)
> >  
> > @@ -503,65 +502,56 @@ static const VMStateDescription vmstate_spapr_drc = {
> >      }
> >  };
> >  
> > -static void drc_realize(DeviceState *d, Error **errp)
> > +static GHashTable *drc_hash_table(void)
> >  {
> > -    SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
> > -    Object *root_container;
> > -    gchar *link_name;
> > -    const char *child_name;
> > +    static GHashTable *dht;
> >  
> > +    if (!dht) {
> > +        dht = g_hash_table_new(NULL, NULL);
> > +    }
> > +
> > +    return dht;
> > +}
> > +
> > +
> > +static void drc_realize(SpaprDrc *drc)
> > +{
> >      trace_spapr_drc_realize(spapr_drc_index(drc));
> > -    /* NOTE: we do this as part of realize/unrealize due to the fact
> > -     * that the guest will communicate with the DRC via RTAS calls
> > -     * referencing the global DRC index. By unlinking the DRC
> > -     * from DRC_CONTAINER_PATH/<drc_index> we effectively make it
> > -     * inaccessible by the guest, since lookups rely on this path
> > -     * existing in the composition tree
> > -     */
> > -    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> > -    link_name = g_strdup_printf("%x", spapr_drc_index(drc));
> > -    child_name = object_get_canonical_path_component(OBJECT(drc));
> > -    trace_spapr_drc_realize_child(spapr_drc_index(drc), child_name);
> > -    object_property_add_alias(root_container, link_name,
> > -                              drc->owner, child_name);
> > -    g_free(link_name);
> > +
> > +    g_hash_table_insert(drc_hash_table(),
> > +                        GUINT_TO_POINTER(spapr_drc_index(drc)), drc);
> >      vmstate_register(VMSTATE_IF(drc), spapr_drc_index(drc), 
> > &vmstate_spapr_drc,
> >                       drc);
> >      trace_spapr_drc_realize_complete(spapr_drc_index(drc));
> >  }
> >  
> > -static void drc_unrealize(DeviceState *d)
> > +static void drc_unrealize(SpaprDrc *drc)
> >  {
> > -    SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
> > -    Object *root_container;
> > -    gchar *name;
> > -
> >      trace_spapr_drc_unrealize(spapr_drc_index(drc));
> >      vmstate_unregister(VMSTATE_IF(drc), &vmstate_spapr_drc, drc);
> > -    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> > -    name = g_strdup_printf("%x", spapr_drc_index(drc));
> > -    object_property_del(root_container, name);
> > -    g_free(name);
> > +    g_hash_table_remove(drc_hash_table(),
> > +                        GUINT_TO_POINTER(spapr_drc_index(drc)));
> >  }
> >  
> >  SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
> >                                           uint32_t id)
> >  {
> >      SpaprDrc *drc = SPAPR_DR_CONNECTOR(object_new(type));
> > -    char *prop_name;
> >  
> >      drc->id = id;
> > -    drc->owner = owner;
> > -    prop_name = g_strdup_printf("dr-connector[%"PRIu32"]",
> > -                                spapr_drc_index(drc));
> > -    object_property_add_child(owner, prop_name, OBJECT(drc));
> > -    object_unref(OBJECT(drc));
> > -    qdev_realize(DEVICE(drc), NULL, NULL);
> > -    g_free(prop_name);
> > +    drc->owner = object_ref(owner);
> > +    SPAPR_DR_CONNECTOR_GET_CLASS(drc)->realize(drc);
> >  
> >      return drc;
> >  }
> >  
> > +void spapr_dr_connector_free(SpaprDrc *drc)
> > +{
> > +    SPAPR_DR_CONNECTOR_GET_CLASS(drc)->unrealize(drc);
> > +    object_unref(drc->owner);
> > +    object_unref(drc);
> > +}
> > +
> >  static void spapr_dr_connector_instance_init(Object *obj)
> >  {
> >      SpaprDrc *drc = SPAPR_DR_CONNECTOR(obj);
> > @@ -575,17 +565,19 @@ static void spapr_dr_connector_instance_init(Object 
> > *obj)
> >      drc->state = drck->empty_state;
> >  }
> >  
> > +static char *drc_vmstate_if_get_id(VMStateIf *obj)
> > +{
> > +    return NULL;
> > +}
> > +
> >  static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> >  {
> > -    DeviceClass *dk = DEVICE_CLASS(k);
> > +    SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
> > +    VMStateIfClass *vc = VMSTATE_IF_CLASS(k);
> >  
> > -    dk->realize = drc_realize;
> > -    dk->unrealize = drc_unrealize;
> > -    /*
> > -     * Reason: DR connector needs to be wired to either the machine or to a
> > -     * PHB in spapr_dr_connector_new().
> > -     */
> > -    dk->user_creatable = false;
> > +    drck->realize = drc_realize;
> > +    drck->unrealize = drc_unrealize;
> > +    vc->get_id = drc_vmstate_if_get_id;
> >  }
> >  
> >  static bool drc_physical_needed(void *opaque)
> > @@ -623,39 +615,32 @@ static void drc_physical_reset(void *opaque)
> >      }
> >  }
> >  
> > -static void realize_physical(DeviceState *d, Error **errp)
> > +static void realize_physical(SpaprDrc *drc)
> >  {
> > -    SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d);
> > -    Error *local_err = NULL;
> > -
> > -    drc_realize(d, &local_err);
> > -    if (local_err) {
> > -        error_propagate(errp, local_err);
> > -        return;
> > -    }
> > +    SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(drc);
> >  
> > +    drc_realize(drc);
> >      vmstate_register(VMSTATE_IF(drcp),
> >                       spapr_drc_index(SPAPR_DR_CONNECTOR(drcp)),
> >                       &vmstate_spapr_drc_physical, drcp);
> >      qemu_register_reset(drc_physical_reset, drcp);
> >  }
> >  
> > -static void unrealize_physical(DeviceState *d)
> > +static void unrealize_physical(SpaprDrc *drc)
> >  {
> > -    SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d);
> > +    SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(drc);
> >  
> > -    drc_unrealize(d);
> > -    vmstate_unregister(VMSTATE_IF(drcp), &vmstate_spapr_drc_physical, 
> > drcp);
> >      qemu_unregister_reset(drc_physical_reset, drcp);
> > +    vmstate_unregister(VMSTATE_IF(drcp), &vmstate_spapr_drc_physical, 
> > drcp);
> > +    drc_unrealize(drc);
> >  }
> >  
> >  static void spapr_drc_physical_class_init(ObjectClass *k, void *data)
> >  {
> > -    DeviceClass *dk = DEVICE_CLASS(k);
> >      SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
> >  
> > -    dk->realize = realize_physical;
> > -    dk->unrealize = unrealize_physical;
> > +    drck->realize = realize_physical;
> > +    drck->unrealize = unrealize_physical;
> >      drck->dr_entity_sense = physical_entity_sense;
> >      drck->isolate = drc_isolate_physical;
> >      drck->unisolate = drc_unisolate_physical;
> > @@ -731,12 +716,16 @@ static void spapr_drc_pmem_class_init(ObjectClass *k, 
> > void *data)
> >  
> >  static const TypeInfo spapr_dr_connector_info = {
> >      .name          = TYPE_SPAPR_DR_CONNECTOR,
> > -    .parent        = TYPE_DEVICE,
> > +    .parent        = TYPE_OBJECT,
> >      .instance_size = sizeof(SpaprDrc),
> >      .instance_init = spapr_dr_connector_instance_init,
> >      .class_size    = sizeof(SpaprDrcClass),
> >      .class_init    = spapr_dr_connector_class_init,
> >      .abstract      = true,
> > +    .interfaces = (InterfaceInfo[]) {
> > +        { TYPE_VMSTATE_IF },
> > +        { }
> > +    },
> >  };
> >  
> >  static const TypeInfo spapr_drc_physical_info = {
> > @@ -789,14 +778,9 @@ static const TypeInfo spapr_drc_pmem_info = {
> >  
> >  SpaprDrc *spapr_drc_by_index(uint32_t index)
> >  {
> > -    Object *obj;
> > -    gchar *name;
> > -
> > -    name = g_strdup_printf("%s/%x", DRC_CONTAINER_PATH, index);
> > -    obj = object_resolve_path(name, NULL);
> > -    g_free(name);
> > -
> > -    return !obj ? NULL : SPAPR_DR_CONNECTOR(obj);
> > +    return
> > +        SPAPR_DR_CONNECTOR(g_hash_table_lookup(drc_hash_table(),
> > +                                               GUINT_TO_POINTER(index)));
> >  }
> >  
> >  SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id)
> > @@ -824,13 +808,12 @@ SpaprDrc *spapr_drc_by_id(const char *type, uint32_t 
> > id)
> >   */
> >  int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t 
> > drc_type_mask)
> >  {
> > -    Object *root_container;
> > -    ObjectProperty *prop;
> > -    ObjectPropertyIterator iter;
> > +    GHashTableIter iter;
> >      uint32_t drc_count = 0;
> >      GArray *drc_indexes, *drc_power_domains;
> >      GString *drc_names, *drc_types;
> >      int ret;
> > +    SpaprDrc *drc;
> >  
> >      /*
> >       * This should really be only called once per node since it overwrites
> > @@ -851,26 +834,12 @@ int spapr_dt_drc(void *fdt, int offset, Object 
> > *owner, uint32_t drc_type_mask)
> >      drc_names = g_string_set_size(g_string_new(NULL), sizeof(uint32_t));
> >      drc_types = g_string_set_size(g_string_new(NULL), sizeof(uint32_t));
> >  
> > -    /* aliases for all DRConnector objects will be rooted in QOM
> > -     * composition tree at DRC_CONTAINER_PATH
> > -     */
> > -    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> > -
> > -    object_property_iter_init(&iter, root_container);
> > -    while ((prop = object_property_iter_next(&iter))) {
> > -        Object *obj;
> > -        SpaprDrc *drc;
> > +    g_hash_table_iter_init(&iter, drc_hash_table());
> > +    while (g_hash_table_iter_next(&iter, NULL, (gpointer) &drc)) {
> >          SpaprDrcClass *drck;
> >          char *drc_name = NULL;
> >          uint32_t drc_index, drc_power_domain;
> >  
> > -        if (!strstart(prop->type, "link<", NULL)) {
> > -            continue;
> > -        }
> > -
> > -        obj = object_property_get_link(root_container, prop->name,
> > -                                       &error_abort);
> > -        drc = SPAPR_DR_CONNECTOR(obj);
> >          drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> >  
> >          if (owner && (drc->owner != owner)) {
> > @@ -951,23 +920,12 @@ out:
> >  
> >  void spapr_drc_reset_all(SpaprMachineState *spapr)
> >  {
> > -    Object *drc_container;
> > -    ObjectProperty *prop;
> > -    ObjectPropertyIterator iter;
> > +    GHashTableIter iter;
> > +    SpaprDrc *drc;
> >  
> > -    drc_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> >  restart:
> > -    object_property_iter_init(&iter, drc_container);
> > -    while ((prop = object_property_iter_next(&iter))) {
> > -        SpaprDrc *drc;
> > -
> > -        if (!strstart(prop->type, "link<", NULL)) {
> > -            continue;
> > -        }
> > -        drc = SPAPR_DR_CONNECTOR(object_property_get_link(drc_container,
> > -                                                          prop->name,
> > -                                                          &error_abort));
> > -
> > +    g_hash_table_iter_init(&iter, drc_hash_table());
> > +    while (g_hash_table_iter_next(&iter, NULL, (gpointer) &drc)) {
> >          /*
> >           * This will complete any pending plug/unplug requests.
> >           * In case of a unplugged PHB or PCI bridge, this will
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index 76d7c91e9c64..ca0cca664e3c 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -1262,7 +1262,7 @@ static void remove_drcs(SpaprPhbState *phb, PCIBus 
> > *bus)
> >          SpaprDrc *drc = drc_from_devfn(phb, chassis, i);
> >  
> >          if (drc) {
> > -            object_unparent(OBJECT(drc));
> > +            spapr_dr_connector_free(drc);
> >          }
> >      }
> >  }
> 

Attachment: pgp0Wf6CPU5zl.pgp
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]