qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-a


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-allocation
Date: Wed, 24 Jul 2013 14:41:36 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130625 Thunderbird/17.0.7

Il 24/07/2013 13:34, Igor Mammedov ha scritto:
> On Wed, 24 Jul 2013 11:41:04 +0200
> Paolo Bonzini <address@hidden> wrote:
> 
>> Il 24/07/2013 10:36, Igor Mammedov ha scritto:
>>> On Tue, 23 Jul 2013 19:09:26 +0200
>>> Paolo Bonzini <address@hidden> wrote:
>>>
>>>> Il 23/07/2013 18:23, Igor Mammedov ha scritto:
>>>>> - if slot property is not specified on -device/device_add command,
>>>>> treat default value as request for assigning DimmDevice to
>>>>> the first free slot.
>>>>
>>>> Even with "-m" instead of "-numa mem", I think this is problematic
>>>> because we still need to separate the host and guest parts of the DIMM
>>>> device.  "-numa mem" (or the QMP command that Wanlong added) will be
>>>> necessary to allocate memory on the host side before adding a DIMM.
>>> why not do host allocation part at the same time when DIMM is added, is
>>> there a real need to separate DIMM device?
>>>
>>> I probably miss something but -numa mem option and co aside what problem
>>> couldn't be solved during DIMM device initialization and would require
>>> a split DIMM device?
>>
>> Because otherwise, every option we add to "-numa mem" will have to be
>> added to "-device dimm".  For example,
>>
>>    -device dimm,policy=interleave
> if it's feature of DIMM device sure, if it is not lets find a better
> place for it. See below for an alternative approach.
> 
>>
>> makes no sense to me.
>>
>> In fact, this is no different from having to do drive_add or netdev_add
>> before device_add.  First you tell QEMU about the host resources to use,
>> then you add the guest device and bind the device to those resources.
>>
>>>> So slots will have three states: free (created with "-m"), allocated (a
>>>> free slot moves to this state with "-numa mem...,populated=no" when
>>>> migrating, or with the QMP command for regular hotplug), populated (an
>>>> allocated slot moves to this state with "-device dimm").
>>>>
>>>> You would be able to plug a DIMM only into an allocated slot, and the
>>>> size will be specified on the slot rather than the DIMM device.
>>> 'slot' property is there only for migration sake to provide stable
>>> numeric ID for QEMU<->ACPI BIOS interface. It's not used for any other
>>> purpose and wasn't intended for any other usage..
>>
>> How would you otherwise refer to the memory you want to affect in a
>> set-mem-policy monitor command?
> could be 'id' property or even better a QOM path
> 
>>
>>> on baremetal slot has noting to do with size of plugged in DIMM,
>>
>> On baremetal slots also belong to a specific NUMA node, for what it's
>> worth.  There are going to be differences with baremetal no matter what.
> sure we can deviate here, but I don't see full picture yet so I'm trying
> to find justification for it first and asking questions. Maybe a better
> solution will be found.
> 
>>
>>> why we
>>> would model it other way if it only brings problems: like predefined size,
>>
>> It doesn't have to be predefined.  In the previous discussions (and also
>> based on Vasilis and Hu Tao's implementations) I assumed predefined slot
>> sizes.  Now I understand the benefit of having a simpler command-line
>> with "-m", but then in return you need three slot states instead of just
>> unpopulated/populated.
>>
>> So you'd just do
>>
>>    set-mem-policy 0 size=2G      # free->allocated
>>    device_add dimm,slotid=0      # allocated->populated
>>
>> to hotplug a 2G DIMM.  And you'll be able to pin it to host NUMA nodes,
>> and assign it to guest NUMA nodes, like this:
>>
>>    set-mem-policy 0 size=2G,nodeid=1,policy=membind host-nodes=0-1
>>    device_add dimm,slotid=0
> Do policy and other -numa mem properties belong to a particular DIMM device
> or rather to a particular NUMA node?
> 
> How about following idea: guest-node maps to a specific host-node, then
> when we plug DIMM, guest node provides information on policies and whatever
> to the creator of DIMM device (via DimmBus and/or mhc) which allocates
> memory, applies policies and binds new memory to a specific host node.
> That would eliminate 2 stage approach.

It makes sense.  My main worry is not to deviate from what we've been
doing for drives and netdevs (because that's a proven design).  Both
"-numa mem" and this proposal satisfy that goal.

I originally proposed "-numa mem" because Vasilis and Hu's patches were
relying on specifying predefined sizes for all slots.  So "-numa mem"
was a good fit for both memory hotplug (done Hu's way) and NUMA policy.
 It also simplified the command line which had a lot of "mem-" prefixed
options.

With the approach you suggest it may not be necessary at all, and we can
go back to just "-numa
node,cpus=0,mem=1G,mem-policy=membind,mem-hostnodes=0-1,cpu-hostnodes=0"
or something like that.

Whether it is workable, it depends on what granularity Wanlong/Hu want.

There may be some scenarios where per-slot policies make sense.  For
example, imagine that in general you want memory to be bound to the
corresponding host node.  It turns out some nodes are now fully
committed and others are free, and you need more memory on a VM.  You
can hotplug that memory without really caring about binding and
momentarily suffer some performance loss.

I agree that specifying the policy on every hotplug complicates
management and may be overkill.  But then, most guests are not NUMA at
all and you would hardly perceive the difference, you would just have to
separate

    set-mem-policy 0 size=2G
    device_add dimm,slot=0

instead of

    device_add dimm,slot,size=2G

which is not a big chore.

> in this case DIMM device only needs to specify where it's plugged in, using
> 'node' property (now number but could become QOM path to NUMA node object).

Yeah, then it's the same as the id.

Paolo

> Ideally it would be QOM hierarchy:
> 
> /nodeX/@dimmbus/dimm_device
> where even 'node' property would become obsolete, just specify right
> bus to attach DIMM device to.
> 
> PS:
> we need a similar QOM hierarchy for CPUs as well to sort out
> -numa cpus=ids mess.
> 
>>
>> Again, this is the same as drive_add/device_add.
>>
>> Paolo
>>
>>> allocated, free etc. I think slot should be either free or busy.
>>>
>>>
>>>>
>>>> In general, I don't think free slots should be managed by the DimmBus,
>>>> and host vs. guest separation should be there even if we accept your
>>>> "-m" extension (doesn't look bad at all, I must say).
>>>>
>>>> Paolo
>>>
>>
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]