qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2 1/1] target-i386: Remove icc_bridge parameter


From: Eduardo Habkost
Subject: Re: [Qemu-devel] [PATCH v2 1/1] target-i386: Remove icc_bridge parameter from cpu_x86_create()
Date: Wed, 11 Mar 2015 10:20:38 -0300
User-agent: Mutt/1.5.23 (2014-03-12)

On Wed, Mar 11, 2015 at 12:59:33PM +0100, Andreas Färber wrote:
> Am 11.03.2015 um 12:11 schrieb Eduardo Habkost:
> > On Tue, Mar 10, 2015 at 11:43:41PM +0100, Andreas Färber wrote:
> >> Am 10.03.2015 um 22:57 schrieb Eduardo Habkost:
> >>> Instead of passing icc_bridge from the PC initialization code to
> >>> cpu_x86_create(), make the PC initialization code attach the CPU to
> >>> icc_bridge.
> >>>
> >>> The only difference here is that icc_bridge attachment will now be done
> >>> after x86_cpu_parse_featurestr() is called. But this shouldn't make any
> >>> difference, as property setters shouldn't depend on icc_bridge.
> >>>
> >>> Signed-off-by: Eduardo Habkost <address@hidden>
> >>> ---
> >>> Changes v1 -> v2:
> >>>  * Keep existing check for NULL icc_bridge and error reporting, instead
> >>>    of assing assert(icc_bridge)
> >>> ---
> >>>  hw/i386/pc.c      | 13 +++++++++++--
> >>>  target-i386/cpu.c | 14 ++------------
> >>>  target-i386/cpu.h |  3 +--
> >>>  3 files changed, 14 insertions(+), 16 deletions(-)
> >>>
> >>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> >>> index b5b2aad..a26e0ec 100644
> >>> --- a/hw/i386/pc.c
> >>> +++ b/hw/i386/pc.c
> >>> @@ -992,18 +992,27 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, 
> >>> int level)
> >>>  static X86CPU *pc_new_cpu(const char *cpu_model, int64_t apic_id,
> >>>                            DeviceState *icc_bridge, Error **errp)
> >>>  {
> >>> -    X86CPU *cpu;
> >>> +    X86CPU *cpu = NULL;
> >>>      Error *local_err = NULL;
> >>>  
> >>> -    cpu = cpu_x86_create(cpu_model, icc_bridge, &local_err);
> >>> +    if (icc_bridge == NULL) {
> >>> +        error_setg(&local_err, "Invalid icc-bridge value");
> >>> +        goto out;
> >>> +    }
> >>> +
> >>> +    cpu = cpu_x86_create(cpu_model, &local_err);
> >>
> >> We had previously discussed reference counting. Here I would expect:
> > 
> > I will try to extend the analysis with ownership of each reference:
> > 
> >>
> >> OBJECT(cpu)->ref == 1
> > 
> > And the owner of the reference is pc_new_cpu() (cpu variable).
> > 
> >>
> >>>      if (local_err != NULL) {
> >>>          error_propagate(errp, local_err);
> >>>          return NULL;
> >>>      }
> >>>  
> >>> +    qdev_set_parent_bus(DEVICE(cpu), qdev_get_child_bus(icc_bridge, 
> >>> "icc"));
> >>
> >> OBJECT(cpu)->ref == 2
> > 
> > And the owners are: pc_new_cpu()/cpu and icc_bridge.
> > 
> > Now, what if we error out and destroy the CPU after we already added the
> > CPU to icc_bridge? Is icc_bridge going to keep pointing to the dead
> > object, or is there some bus-detach magic somewhere that will make it
> > work?
> 
> Lowering the ref count to 0 will trigger object_finalize(), which via
> object_property_del_all() triggers unparenting via
> object_finalize_child_property(), which in the device case causes
> unrealization of the device and unparenting of its owned buses
> (qdev.c:device_unparent()).

OK, I see. (See my comment about the chicken-egg problem below)

> 
> >>> +    object_unref(OBJECT(cpu));
> >>
> >> OBJECT(cpu)->ref == 1
> > 
> > Here pc_new_cpu() is dropping its reference too early! In practice it is
> > now borrowing the reference owned by icc_bridge, and I don't think we
> > should do that.
> > 
> > I just kept the object_unref() call here because I didn't want to change
> > any behavior when moving code, but I think it doesn't belong here.
> 
> Agree. In my patches I placed it after the last usage of cpu in this
> function, but actually since it is the return value it would even need
> to go into the callers. pc_cpus_init() is currently ignoring the return
> value.

Is any caller actually using the return value for anything, in your
series?

> 
> For using the CPU as a child<> of a CPU core object that changes in my
> series, so I'll fix that.
> 
> >>> +
> >>>      object_property_set_int(OBJECT(cpu), apic_id, "apic-id", &local_err);
> >>>      object_property_set_bool(OBJECT(cpu), true, "realized", &local_err);
> >>
> >> OBJECT(cpu)->ref == 1 or 2 depending on DeviceClass::realize :)
> > 
> > If it's 2, it won't be our problem as we don't own the extra reference.
> > It's the responsibility of whoever grabbed the extra reference.
> > 
> > But I assume the property setters above MUST not add any extra reference
> > in case they return an error. Correct?
> 
> So, the extra reference would be the one added by device_set_realized()
> for /machine/unattached parent. That happens as very first thing on
> false -> true transition. If an error occurs either in
> DeviceClass::realize or later in device_set_realized() then that
> reference will remain, but it is re-entrant in only doing so once.

Is that reference really supposed to remain if realize fails?

> 
> I have refreshing and combining the two competing qom-tree HMP patches
> on my radar and am starting to think that reference counting information
> may be a third interesting mode of output...
> 
> >>>  
> >>> +out:
> >>>      if (local_err) {
> >>>          error_propagate(errp, local_err);
> >>>          object_unref(OBJECT(cpu));
> >>
> > 
> > And here we have something that was already broken: X86CPU instance_init
> > calls cpu_exec_init(), the CPU is added to the global CPU list without
> > increasing reference counting, and the global list will point to a
> > destroyed object if we enter the error path.
> > 
> > In other words, if anything fails after cpu_exec_init() is called, we're
> > screwed. In the future it will be on realize, but we probably need to
> > move it closer to the end of realize.
> 
> Yeah, most of the error handling kind of assumes that when an error
> occurs we report the Error* upwards and exit QEMU, with the user
> restarting from scratch. With CPU hotplug that is a nasty assumption.

Exactly.

> 
> >> object_unref(NULL) looks unusual but is valid.
> > 
> > Yes. Makes the code simpler. :)
> > 
> >>
> >> Should we change the return NULL to jump here, too, then?
> > 
> > Yes, the cpu_x86_create() error check could just do a "goto out".
> 
> I assume you are planning to apply this patch through your tree? Will
> you submit a v3? Otherwise can you tweak when applying and let me know
> so I can cherry-pick? Thanks.

I plan to apply this through my tree once I get at least one Reviewed-by
(preferably from you). I will submit an extra patch for the additional
"goto out" as a reply to this thread.

> 
> >> OBJECT(cpu)->ref == 0 or 1
> >>
> >> I wonder whether we need another object_unref(OBJECT(cpu)) for the
> >> non-error case, either here or in the callers? Out of scope for this
> >> patch, of course.
> > 
> > So, how I see it: if we are returning a reference to the object, now it
> > belongs to the caller, and it should be the caller responsibility to
> > call object_unref(). So the non-error object_unref() after
> > qdev_set_parent_bus() is not supposed to be in pc_new_cpus(), but in the
> > callers.
> 
> Yeah, agree.
> 
> > Either way we choose, we should document it so callers know who
> > owns the reference they get.
> > 
> > Alternatively, we could simply change pc_new_cpu() to _not_ return a
> > pointer to the CPU, and make pc_cpus_init() deal with the APIC MMIO
> > mapping using some another approach.
> 
> No, that won't work for me.

If you have a caller that still needs the return value, that's OK. But
I'm confused as you said that pc_cpus_init() is ignoring the return
value in your patches.

> 
> But my question was rather whether changes for the error case will be
> needed? If the reference for /machine/unassigned/device[n] remains
> around, then our local object_unref() won't unparent/finalize the
> device, meaning it stays around. If we do an additional unref then
> unparenting would make the ref count go to -1. So we may need to
> unparent it explicitly here in this error path? Hard to test.

I guess we need to make the rules about who own each reference
clearer[1]. Right now it is a chicken-egg problem, even for icc_bus:
unparent will happen only on object_unref(), but icc_bus will owns a
reference and will drop it only when the device is unparented. So it
looks like explicit unparent is really required.

---

[1] The way I see it, the general rule should be: if your you provide a
function that will increase refcount, you own the reference and you need
to provide a mechanism to undo that (instead of requiring callers to own
the reference and drop it for you). Alternatively, if you don't want to
provide an undo mechanism, then simply don't grab a reference and undo
your stuff automatically on finalize.

I really don't see other alternatives: either you own the reference and
you take care of it completely, or you simply don't grab one (and make
sure you stop pointing to the object on finalize).

-- 
Eduardo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]