[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [Qemu-devel] [PATCH 2/5] hw/ppc: removing spapr_drc_detac
From: |
David Gibson |
Subject: |
Re: [Qemu-ppc] [Qemu-devel] [PATCH 2/5] hw/ppc: removing spapr_drc_detach_cb opaques |
Date: |
Thu, 4 May 2017 17:24:34 +1000 |
User-agent: |
Mutt/1.8.0 (2017-02-23) |
On Wed, May 03, 2017 at 05:33:54PM -0300, Daniel Henrique Barboza wrote:
> Update: I have talked with Michael Roth about the spapr_release_lmb
> callback, the flow
> of the LMB releases and so on. He clarified to me that it is not possible to
> get rid of
> the callback and put its code in the spapr_del_lmbs function.
>
> The reason is that the callback is being executed by the guest via
> spapr_rtas.c:rtas_set_indicator(),
> which in turn will follow the flow of the DRC releases and eventually will
> hit the spapr_release_lmb
> callback, but this will not necessarily happen in the spam of the
> spapr_del_lmbs function. This means that
> my idea of putting the cb code in the spapr_del_lmbs and removing the
> callback not possible.
>
> On the other hand, Bharata raised the issue about the scan function in the
> callback being a problem.
> My tests with a 1 Gb unplug didn't show any noticeable delay increase but in
> theory we support memory
> unplugs of 1 Terabyte. With 1Gb of RAM we need 4 DRCs, so 1 Tb would require
> 4000 DRCs. Then we
> would scan through them 4000 times. I don't think the scan inside the
> callback is feasible in this scenario
> either.
>
> In the v9 I'll migrate the sPAPRDIMMState structure like Michael Roth
> mentioned somewhere in the
> v6 review to use it inside the spapr_lmb_release callback - looks like the
> best option we have.
I don't think you have to migrate that actual structure, just the fact
that there's an in-progress removal (which you might be able to derive
from existing migrated state anyway). You can reconstruct the nr_lmbs
state with a full scan on post_load(). That way it's only O(N) not
O(N^2), and only in the case that a migration occurs mid-unplug.
>
>
> Thanks,
>
>
> Daniel
>
>
>
> On 05/03/2017 10:56 AM, Daniel Henrique Barboza wrote:
> >
> >
> > On 05/03/2017 12:26 AM, Bharata B Rao wrote:
> > > On Tue, May 2, 2017 at 1:13 PM, Daniel Henrique Barboza
> > > <address@hidden <mailto:address@hidden>>
> > > wrote:
> > >
> > >
> > >
> > > On 05/02/2017 12:40 AM, Bharata B Rao wrote:
> > > > On Sun, Apr 30, 2017 at 10:55 PM, Daniel Henrique Barboza
> > > > <address@hidden
> > > > <mailto:address@hidden>> wrote:
> > > >
> > > > Following up the previous detach_cb change, this patch
> > > > removes the
> > > > detach_cb_opaque entirely from the code.
> > > >
> > > > The reason is that the drc->detach_cb_opaque object can't be
> > > > restored in the post load of the upcoming DRC migration and
> > > > no detach
> > > > callbacks actually need this opaque. 'spapr_core_release' is
> > > > receiving it as NULL, 'spapr_phb_remove_pci_device_cb' is
> > > > receiving
> > > > a phb object as opaque but is't using it. These were trivial
> > > > removal
> > > > cases.
> > > >
> > > > However, the LM removal callback 'spapr_lmb_release' is
> > > > receiving
> > > > and using the opaque object, a 'sPAPRDIMMState' struct. This
> > > > struct
> > > > holds the number of LMBs the DIMM object contains and the
> > > > callback
> > > > was using this counter as a countdown to check if all LMB
> > > > DRCs were
> > > > release before proceeding to the DIMM unplug. To remove the
> > > > need of
> > > > this callback we have choices such as:
> > > >
> > > > - migrate the 'sPAPRDIMMState' struct. This would require
> > > > creating a
> > > > QTAILQ to store all DIMMStates and an additional 'dimm_id'
> > > > field to
> > > > associate the DIMMState with the DIMM object. We could attach
> > > > this
> > > > QTAILQ to the 'sPAPRPHBState' and retrieve it later in the
> > > > callback.
> > > >
> > > > - fetch the state of the LMB DRCs directly by scanning the
> > > > state of
> > > > them and, if all of them are released, proceed with the DIMM
> > > > unplug.
> > > >
> > > > The second approach was chosen. The new
> > > > 'spapr_all_lmbs_drcs_released'
> > > > function scans all LMBs of a given DIMM device to see if
> > > > their DRC
> > > > state are inactive. If all of them are inactive return
> > > > 'true', 'false'
> > > > otherwise. This function is being called inside the
> > > > 'spapr_lmb_release'
> > > > callback, replacing the role of the 'sPAPRDIMMState'
> > > > opaque. The
> > > > 'sPAPRDIMMState' struct was removed from the code given that
> > > > there are
> > > > no more uses for it.
> > > >
> > > > After all these changes, there are no roles left for the
> > > > 'detach_cb_opaque'
> > > > attribute of the 'sPAPRDRConnector' as well, so we can safely
> > > > remove
> > > > it from the code too.
> > > >
> > > > Signed-off-by: Daniel Henrique Barboza
> > > > <address@hidden
> > > > <mailto:address@hidden>>
> > > > ---
> > > > hw/ppc/spapr.c | 46
> > > > +++++++++++++++++++++++++++++++++-------------
> > > > hw/ppc/spapr_drc.c | 16 +++++-----------
> > > > hw/ppc/spapr_pci.c | 4 ++--
> > > > include/hw/ppc/spapr_drc.h | 6 ++----
> > > > 4 files changed, 42 insertions(+), 30 deletions(-)
> > > >
> > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > > index bc11757..8b9a6cf 100644
> > > > --- a/hw/ppc/spapr.c
> > > > +++ b/hw/ppc/spapr.c
> > > > @@ -1887,21 +1887,43 @@ static void spapr_drc_reset(void
> > > > *opaque)
> > > > }
> > > > }
> > > >
> > > > -typedef struct sPAPRDIMMState {
> > > > - uint32_t nr_lmbs;
> > > > -} sPAPRDIMMState;
> > > > +static bool spapr_all_lmbs_drcs_released(PCDIMMDevice *dimm)
> > > > +{
> > > > + Error *local_err = NULL;
> > > > + PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> > > > + MemoryRegion *mr = ddc->get_memory_region(dimm);
> > > > + uint64_t size = memory_region_size(mr);
> > > > +
> > > > + uint64_t addr;
> > > > + addr = object_property_get_int(OBJECT(dimm),
> > > > PC_DIMM_ADDR_PROP, &local_err);
> > > > + if (local_err) {
> > > > + error_propagate(&error_abort, local_err);
> > > > + return false;
> > > > + }
> > > > + uint32_t nr_lmbs = size / SPAPR_MEMORY_BLOCK_SIZE;
> > > >
> > > > -static void spapr_lmb_release(DeviceState *dev, void *opaque)
> > > > + sPAPRDRConnector *drc;
> > > > + int i = 0;
> > > > + for (i = 0; i < nr_lmbs; i++) {
> > > > + drc =
> > > > spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > > > + addr / SPAPR_MEMORY_BLOCK_SIZE);
> > > > + g_assert(drc);
> > > > + if (drc->indicator_state !=
> > > > SPAPR_DR_INDICATOR_STATE_INACTIVE) {
> > > > + return false;
> > > > + }
> > > > + addr += SPAPR_MEMORY_BLOCK_SIZE;
> > > > + }
> > > > + return true;
> > > > +}
> > > > +
> > > > +static void spapr_lmb_release(DeviceState *dev)
> > > > {
> > > > - sPAPRDIMMState *ds = (sPAPRDIMMState *)opaque;
> > > > HotplugHandler *hotplug_ctrl;
> > > >
> > > > - if (--ds->nr_lmbs) {
> > > > + if (!spapr_all_lmbs_drcs_released(PC_DIMM(dev))) {
> > > > return;
> > > > }
> > > >
> > > >
> > > > I am concerned about the number of times we walk the DRC list
> > > > corresponding to each DIMM device. When a DIMM device is being
> > > > removed, spapr_lmb_release() will be invoked for each of the LMBs
> > > > of that DIMM. Now in this scheme, we end up walking through all
> > > > the DRC objects of the DIMM from every LMB's release function.
> > >
> > > Hi Bharata,
> > >
> > >
> > > I agree, this is definitely a poorer performance than simply
> > > decrementing ds->nr_lmbs.
> > > The reasons why I went on with it:
> > >
> > > - hot unplug isn't an operation that happens too often, so it's
> > > not terrible
> > > to have a delay increase here;
> > >
> > > - it didn't increased the unplug delay in an human noticeable way,
> > > at least in
> > > my tests;
> > >
> > > - apart from migrating the information, there is nothing much we
> > > can do in the
> > > callback side about it. The callback isn't aware of the current
> > > state of the DIMM
> > > removal process, so the scanning is required every time.
> > >
> > >
> > > All that said, assuming that the process of DIMM removal will
> > > always go through
> > > 'spapr_del_lmbs', why do we need this callback? Can't we simply do
> > > something
> > > like this in spapr_del_lmbs?
> > >
> > >
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index cd42449..e443fea 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -2734,6 +2734,20 @@ static void spapr_del_lmbs(DeviceState
> > > *dev, uint64_t addr_start, uint64_t size,
> > > addr += SPAPR_MEMORY_BLOCK_SIZE;
> > > }
> > >
> > > + if (!spapr_all_lmbs_drcs_released(PC_DIMM(dev))) {
> > > + // something went wrong in the removal of the LMBs.
> > > + // propagate error and return
> > > + throw_error_code;
> > > + return;
> > > + }
> > >
> > >
> > > spapr_del_lmbs() is called from ->unplug_request(). Here we notify
> > > the guest about the unplug request. We have to wait till the guest
> > > gives us a go ahead so that we can cleanup the DIMM device. The
> > > cleanup is done as part of release callback (spapr_lmb_release) at
> > > which point we are sure that the given LMB has been indeed removed
> > > by the guest.
> >
> > I wasn't clear enough in my last comment. Let me rephrase. Is there any
> > other use for
> > the 'spapr_lmb_release' callback function other than being called by the
> > spapr_del_lmbs()
> > in the flow you've stated above? Searching the master code now I've
> > found:
> >
> >
> > $ grep -R 'spapr_lmb_release' .
> > ./spapr.c:static void spapr_lmb_release(DeviceState *dev, void *opaque)
> > ./spapr.c: drck->detach(drc, dev, spapr_lmb_release, ds, errp);
> >
> >
> > Note that all the callback is doing is asserting that a nr_lmb counter
> > will be zero after
> > a decrement and, if true, execute the following:
> >
> >
> > hotplug_ctrl = qdev_get_hotplug_handler(dev);
> > hotplug_handler_unplug(hotplug_ctrl, dev, &error_abort);
> >
> >
> > So, if the callback spapr_lmb_release is only being called in the
> > following for loop of spapr_del_lmbs()
> > to clean up each LMB DRC, can't we get rid of it and do the following
> > after this for loop?
> >
> > for (i = 0; i < nr_lmbs; i++) {
> > drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
> > addr / SPAPR_MEMORY_BLOCK_SIZE);
> > g_assert(drc);
> >
> > drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > drck->detach(drc, dev, ds, errp);
> > addr += SPAPR_MEMORY_BLOCK_SIZE;
> > }
> >
> > if (!spapr_all_lmbs_drcs_released(PC_DIMM(dev))) {
> > // All LMBs were cleared, proceed with detach
> > hotplug_ctrl = qdev_get_hotplug_handler(dev);
> > hotplug_handler_unplug(hotplug_ctrl, dev, &error_abort);
> > }
> > // proceed with spapr_del_lmbs code
> >
> >
> > Doesn't this code does exactly the same thing that the callback does
> > today? Note that we can
> > even use that conditional to block the remaining spapr_del_lmbs code
> > from executing if the
> > LMBs weren't properly cleansed - something that today isn't done.
> >
> >
> > If removing this callback is too problematic or can somehow cause
> > problems that I am unable to
> > foresee, then the alternative would be to either deal with the scanning
> > inside the callback
> > (as it is being done in this patch) or migrate the nr_lmbs information
> > for late retrieval in
> > the callback. I am fine with any alternative, we just need to agree on
> > what makes more
> > sense.
> >
> >
> > Daniel
> >
> > >
> > > Regards,
> > > Bharata.
> >
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature
Re: [Qemu-ppc] [Qemu-devel] [PATCH 2/5] hw/ppc: removing spapr_drc_detach_cb opaques, David Gibson, 2017/05/04