qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: cxl nvdimm Potential probe ordering issues.


From: Dan Williams
Subject: Re: cxl nvdimm Potential probe ordering issues.
Date: Fri, 20 Jan 2023 14:41:05 -0800

Gregory Price wrote:
> On Fri, Jan 20, 2023 at 09:38:13AM -0800, Dan Williams wrote:
> > As it stands currently that dax device and the cxl device are not
> > related since a default dax-device is loaded just based on the presence
> > of an EFI_MEMORY_SP address range in the address map. With the new ram
> > enabling that default device will be elided and CXL will register a
> > dax-device parented by a cxl region.
> > 
> > >    - The memory *does not* auto-online, instead the dax device can be
> > >      onlined as system-ram *manually* via ndctl and friends
> > 
> > That *manually* part is the problem that needs distro help to solve. It
> > should be the case that by default all Linux distributions auto-online
> > all dax-devices. If that happens to online memory that is too slow for
> > general use, or too high-performance / precious for general purpose use
> > then the administrator can set policy after the fact. Unfortunately user
> > policy can not be applied if these memory ranges were onlined by the
> > kernel at boot , so that's why the kernel policy defaults to not-online.
> > 
> > In other words, there is no guarantee that memory that was assigned to
> > the general purpose pool at boot can be removed. The only guaranteed
> > behavior is to never give the memory to the core kernel in the first
> > instance and always let user policy route the memory.
> > 
> > > 3) The code creates an nvdimm_bridge IFF a CFMW is defined - regardless
> > >    of the type-3 device configuration (pmem-only or vmem-only)
> > 
> > Correct, the top-level bus code (cxl_acpi) and the endpoint code
> > (cxl_mem, cxl_port) need to handshake before establishing regions. For
> > pmem regions the platform needs to claim the availability of a pmem
> > capable CXL window.
> > 
> > > 4) As you can see above, multiple decoders are registered.  I'm not sure
> > >    if that's correct or not, but it does seem odd given there's only one
> > >    cxl type-3 device.  Odd that decoder0.0 shows up when CFMW is there,
> > >    but not when it isn't.
> > 
> > CXL windows are modeled as decoders hanging off the the CXL root device
> > (ACPI0017 on ACPI based platforms). An endpoint decoder can then map a
> > selection of that window.
> > 
> > > Don't know why I haven't thought of this until now, but is the CFMW code
> > > reporting something odd about what's behind it?  Is it assuming the
> > > devices are pmem?
> > 
> > No, the cxl_acpi code is just advertising platform decode possibilities
> > independent of what devices show up. Think of this like the PCI MMIO
> > space that gets allocated to a root bridge at the beginning of time.
> > That space may or may not get consumed based on what devices show up
> > downstream.
> 
> Thank you for the explanation Dan, and thank you for you patience
> @JCameron.  I'm fairly sure I grok it now.
> 
> Summarizing to make sure: the cxl driver is providing what would be the
> CXL.io (control) path, and the CXL.mem path is basically being simulated
> by what otherwise would be a traditional PCI memory region. This explains
> why turning off Legacy mode drops the dax devices, and why the topology
> looks strange - the devices are basically attached in 2 different ways.
> 
> Might there be interest from the QEMU community to implement this
> legacy-style setup in the short term, in an effort to test the the
> control path of type-3 devices while we wait for the kernel to catch up?
> 
> Or should we forget this mode ever existed and just barrel forward
> with HDM decoders and writing the kernel code to hook up the underlying
> devices in drivers/cxl?

Which mode are you referring?

The next steps for the kernel enabling relevant to this thread are:

* ram region discovery (platform firmware or kexec established)
* ram region creation
* pmem region discovery (from labels)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]