[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v5 00/43] CXl 2.0 emulation Support
From: |
Michael S. Tsirkin |
Subject: |
Re: [PATCH v5 00/43] CXl 2.0 emulation Support |
Date: |
Fri, 4 Feb 2022 09:03:23 -0500 |
On Wed, Feb 02, 2022 at 02:09:54PM +0000, Jonathan Cameron wrote:
> Changes since v4:
> https://lore.kernel.org/linux-cxl/20220124171705.10432-1-Jonathan.Cameron@huawei.com/
>
> Note documentation patch that Alex requested to follow.
> I don't want to delay getting this out as Alex mentioned possibly
> having time to continue reviewing in latter part of this week.
>
> Issues identified by CI / Alex Bennée
> - Stubs added for hw/cxl/cxl-host and hw/acpi/cxl plus related meson
> changes to use them as necessary.
> - Drop uid from cxl-test (result of last minute change in v4 that was not
> carried through to the test)
> - Fix naming clash with field name ERROR which on some arches is defined
> and results in the string being replaced with 0 in some of the
> register field related defines. Call it ERR instead.
> - Fix type issue around mr->size by using 64 bit acessor functions.
> - Add a new patch to exclude pxb-cxl from device-crash-test in similar
> fashion to pxb.
>
> CI tests now passing with exception of checkpatch which has what
> I think is a false positive and build-oss-fuzz which keeps timing out.
> https://gitlab.com/jic23/qemu/-/pipelines/460109208
> There were a few tweaks to patch descriptions after I pushed that
> out (I missed a few RB from Alex).
There's an RFC patch that needs review from core memory maintainers,
so I guess not all of it is for merge just yet?
Is there any way we can start applying this patchset gradually?
> Other changes (mostly from Alex's review)
> - Change component register handling to now report UNIMP and return 0
> for 8 byte registers as we currently don't implement any of them.
> Note that this means we need a kernel fix:
>
> https://lore.kernel.org/linux-cxl/20220201153437.2873-1-Jonathan.Cameron@huawei.com/
> - Drop majority of the macros used in defining mailbox handlers in
> favour of written out code.
> - Use REG64 where appropriate. This was introduced whilst this set
> has been underdevelopment so I missed it.
> - Clarify some register access options wrt to CXL 2.0 Errata F4.
> - Change timestamp to qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL)
> - Use typed enums to enforce types of function arguements.
> - Default to cxl being off in machine_class_init() removing
> need to set it to off in machines where there is no support as yet.
> - Add Alex's RB where given.
>
> Looking in particular for:
> * Review of the PCI interactions
> * x86 and ARM machine interactions (particularly the memory maps)
> * Review of the interleaving approach - is the basic idea
> acceptable?
> * Review of the command line interface.
> * CXL related review welcome but much of that got reviewed
> in earlier versions and hasn't changed substantially.
>
> Big TODOs:
>
> * Interleave boundary issues. I haven't yet solved this but didn't
> want to futher delay the review of the rest of the series.
>
> * Volatile memory devices (easy but it's more code so left for now).
> * Switch support. Linux kernel support is under review currently,
> so there is now something to test against.
> * Hotplug? May not need much but it's not tested yet!
> * More tests and tighter verification that values written to hardware
> are actually valid - stuff that real hardware would check.
> * Testing, testing and more testing. I have been running a basic
> set of ARM and x86 tests on this, but there is always room for
> more tests and greater automation.
> * CFMWS flags as requested by Ben.
>
> Why do we want QEMU emulation of CXL?
>
> As Ben stated in V3, QEMU support has been critical to getting OS
> software written given lack of availability of hardware supporting the
> latest CXL features (coupled with very high demand for support being
> ready in a timely fashion). What has become clear since Ben's v3
> is that situation is a continuous one. Whilst we can't talk about
> them yet, CXL 3.0 features and OS support have been prototyped on
> top of this support and a lot of the ongoing kernel work is being
> tested against these patches. The kernel CXL mocking code allows
> some forms of testing, but QEMU provides a more versatile and
> exensible platform.
>
> Other features on the qemu-list that build on these include PCI-DOE
> /CDAT support from the Avery Design team further showing how this
> code is useful. Whilst not directly related this is also the test
> platform for work on PCI IDE/CMA + related DMTF SPDM as CXL both
> utilizes and extends those technologies and is likely to be an early
> adopter.
> Refs:
> CMA Kernel:
> https://lore.kernel.org/all/20210804161839.3492053-1-Jonathan.Cameron@huawei.com/
> CMA Qemu:
> https://lore.kernel.org/qemu-devel/1624665723-5169-1-git-send-email-cbrowy@avery-design.com/
> DOE Qemu:
> https://lore.kernel.org/qemu-devel/1623329999-15662-1-git-send-email-cbrowy@avery-design.com/
>
> As can be seen there is non trivial interaction with other areas of
> Qemu, particularly PCI and keeping this set up to date is proving
> a burden we'd rather do without :)
>
> Ben mentioned a few other good reasons in v3:
> https://lore.kernel.org/qemu-devel/20210202005948.241655-1-ben.widawsky@intel.com/
>
> The evolution of this series perhaps leave it in a less than
> entirely obvious order and that may get tidied up in future postings.
> I'm also open to this being considered in bite sized chunks. What
> we have here is about what you need for it to be useful for testing
> currently kernel code. Note the kernel code is moving fast so
> since v4, some features have been introduced we don't yet support in
> QEMU (e.g. use of the PCIe serial number extended capability).
>
> All comments welcome.
>
> qemu-system-aarch64 -M virt,gic-version=3,cxl=on \
> -m 4g,maxmem=8G,slots=8 \
> ...
> -object
> memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M \
> -object
> memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M
> \
> -object
> memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M
> \
> -object
> memory-backend-file,id=cxl-mem4,share=on,mem-path=/tmp/cxltest4.raw,size=256M
> \
> -object
> memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M \
> -object
> memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M \
> -object
> memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M \
> -object
> memory-backend-file,id=cxl-lsa4,share=on,mem-path=/tmp/lsa4.raw,size=256M \
> -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> -device pxb-cxl,bus_nr=222,bus=pcie.0,id=cxl.2 \
> -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> -device
> cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0,size=256M
> \
> -device cxl-rp,port=1,bus=cxl.1,id=root_port14,chassis=0,slot=3 \
> -device
> cxl-type3,bus=root_port14,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem1,size=256M
> \
> -device cxl-rp,port=0,bus=cxl.2,id=root_port15,chassis=0,slot=5 \
> -device
> cxl-type3,bus=root_port15,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem2,size=256M
> \
> -device cxl-rp,port=1,bus=cxl.2,id=root_port16,chassis=0,slot=6 \
> -device
> cxl-type3,bus=root_port16,memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3,size=256M
> \
> -cxl-fixed-memory-window targets=cxl.1,size=4G,interleave-granularity=8k \
> -cxl-fixed-memory-window
> targets=cxl.1,targets=cxl.2,size=4G,interleave-granularity=8k
>
> First CFMWS suitable for up to 2 way interleave, the second for 4 way (2 way
> at host level and 2 way at the host bridge).
> targets=<range of pxb-cxl uids> , multiple entries if range is disjoint.
>
> With the v5.17-rc1 + patch series listed below.
>
> cd /sys/bus/cxl/devices/
> region=$(cat decoder0.1/create_region)
> echo $region > decoder0.1/create_region
> ls -lh
>
> //Note the order of devices and adjust the following to make sure they
> //are in order across the 4 root ports. Easy to do in a tool, but
> //not easy to paste in a cover letter.
>
> cd region0.1\:0
> echo 4 > interleave_ways
> echo mem2 > target0
> echo mem3 > target1
> echo mem0 > target2
> echo mem1 > target3
> echo $((1024<<20)) > size
> echo 4096 > interleave_granularity
> echo region0.1:0 > /sys/bus/cxl/drivers/cxl_region/bind
>
> Tested with devmem2 and files with known content.
> Kernel tree is mainline + (I based on 5.17-rc1)
> [PATCH] cxl/regs: Fix size of CXL Capabilty Header Register
> https://lore.kernel.org/linux-cxl/20220201182934.jjvavjsf4h7oqngv@intel.com/T/#t
>
> [PATCH v3 00/40] CXL.mem Topology Discovery and Hotplug Support
> https://lore.kernel.org/linux-cxl/164298411792.3018233.7493009997525360044.stgit@dwillia2-desk3.amr.corp.intel.com/
> Note that series has a lot of v4/v5 patches are replies but b4 does
> a good job of pulling out the latest.
>
> [PATCH 0/2] cxl/port: Robustness fixes for decoder enumeration
> https://lore.kernel.org/linux-cxl/164317463887.3438644.4087819721493502301.stgit@dwillia2-desk3.amr.corp.intel.com/
>
> [PATCH 0/4] Unify meaning of interleave attributes
> https://lore.kernel.org/linux-cxl/20220127212911.127741-1-ben.widawsky@intel.com/
>
> [PATCH v3 00/14] CXL Region driver
> https://lore.kernel.org/linux-cxl/20220128002707.391076-1-ben.widawsky@intel.com/
>
> What follows is a first attempt at explaining how all these components
> fit together. I'll write up some formal documentation shortly.
>
> Memory Address Map for CXL elements. Note where exactly these regions
> appear is Arch and platform dependent.
>
> Base somewhere far up in the Host PA map.
> _______________________________
> | |
> | CXL Host Bridge 0 Registers |
> | CXL Host Bridge 1 Registers |
> | ... | This bit is normal MMIO register space.
> | CXL Host bridge N registers | including programmable interleave decoders
> |______________________________| for interleave across root ports.
> | |
> ....
> | |
> |______________________________|
> | |
> | CFMW 0, | Note that there can be multiple regions
> | Interleave 2 way, targets | of memory within this 1TB which can be
> | Hostbridge 0, Hostbridge 1 | interleaved differently: in the host bridges
> | Granularity 16KiB, 1TB | across root ports or in switches below the
> root.
> |______________________________| ports
> | |
> | CFMW 1, |
> | Interleave 1 way, target |
> | Hostbridge 0, 512GiB |
> |______________________________|
> etc for all interleave combinations
> configured, or built in to the
> system before any generic software
> sees it.
>
> System Topology considering CFMW 0 only to keep this simple.
> x marks the match in each decoder level.
> Switches have more interleave decoders and other features
> that we haven't implemented yet in QEMU.
>
> Address Read to CFMW0 base + N
> _________________|________________
> | |
> | Host interconnect |
> | Configured to route CFM |
> | memory access to particular HB |
> |_____x____________________________|
> | |
> Interleave Decoder |
> Matches this HB |
> | |
> _______|__________ _____|____________
> | | | |
> | CXL HB 0 | | CXL HB 1 | Only exist in PCI
> (mostly)
> | HB IntLv Decoder | | HB IntLv Decoder | via ACPI description
> | PCI Root Bus 0c | | PCI Root Bus 0d |
> |x_________________| |__________________| In CXL have MMIO
> | | | | at location given in
> CEDT
> | | | | CHBS entry (ACPI)
> ____________|___ __________|__ __|_________ ___|_________
> | Root Port 0 | | Root Port 1 | | Root Port 2| | Root Port 3 |
> | Appears in | | Appears in | | Appears in | | Appear in |
> | PCI topology | | PCI Topology| | PCI Topo | | PCI Topo |
> | As 0c:00.0 | | as 0c:01.0 | | as de:00.0 | | as de:01.0 |
> |_______________| |_____________| |____________| |_____________|
> | | | |
> | | | |
> _____|_________ ______|______ ______|_____ ______|_______
> | x | | | | | | |
> | CXL Type3 0 | | CXL Type3 1 | | CXL type3 2| | CLX Type 3 3 |
> | | | | | | | |
> | PMEM0(Vol LSA)| | PMEM1 (...) | | PMEM2 (...)| | PMEM3 (...) |
> | Decoder to go | | | | | | |
> | from host PA | | PCI 0e:00.0 | | PCI df:00.0| | PCI e0:00.0 |
> | to device PA | | | | | | |
> | PCI as 0d:00.0| | | | | | |
> |_______________| |_____________| |____________| |______________|
>
> Backed by Backed by Backed by Backed by
> file 0 file 1 file 2 file 3
>
> LSA backed by additional files for each device (not yet supported)
>
> So currently we have decoders as follows for each interleaved access.
> 1) CFMW decoder - fixed config so forms part of qemu command line.
> 2) Host bridge decoders - programmable decoders that the system
> software will program either based on user command or based
> on info from the Label Storage Area (not yet emulated)
> 3) Type 3 device decoders. Down to here the address used is the
> Host PA. These decoders convert to the local device PA
> (in simple case - drop some bits in the middle of the address)
>
> Future patches will add decoders in switch upstream ports making
> the above diagram have another layer between root ports and
> the memory devices.
>
> Note, we've focused for now on Persistent Memory devices as they are seen
> as an early and important usecase (and are the most complex one).
> But it should be straight forward to add volatile memory
> support and indeed that would be backed by RAM.
>
> lspci -tv for above shows
>
> -+-[0000:00]-+-00.0 Red Hat, Inc. QEMU PCIe Host Bridge (this is the cxl PXB)f
> | \-OTHER STUFF
> +-[0000:0c]-+-00.0-[0d]----00.0 Intel Corporation Device 0d93
> | \-01.0-[0e]----00.0 Intel Corporation Device 0d93
> \-[0000:de]-+-00.0-[df]----00.0 Intel Corporation Device 0d93
> \-01.0-[e0]----00.0 Intel Corporation Device 0d93
>
> Where those Intel parts are the type 3 devices.
>
> All comments welcome!
>
> Particular thanks to Alex Bennée for his review of v4.
>
> Thanks,
>
> Jonathan
>
> Ben Widawsky (26):
> hw/pci/cxl: Add a CXL component type (interface)
> hw/cxl/component: Introduce CXL components (8.1.x, 8.2.5)
> hw/cxl/device: Introduce a CXL device (8.2.8)
> hw/cxl/device: Implement the CAP array (8.2.8.1-2)
> hw/cxl/device: Implement basic mailbox (8.2.8.4)
> hw/cxl/device: Add memory device utilities
> hw/cxl/device: Add cheap EVENTS implementation (8.2.9.1)
> hw/cxl/device: Timestamp implementation (8.2.9.3)
> hw/cxl/device: Add log commands (8.2.9.4) + CEL
> hw/pxb: Use a type for realizing expanders
> hw/pci/cxl: Create a CXL bus type
> hw/pxb: Allow creation of a CXL PXB (host bridge)
> acpi/pci: Consolidate host bridge setup
> hw/cxl/component: Implement host bridge MMIO (8.2.5, table 142)
> hw/cxl/rp: Add a root port
> hw/cxl/device: Add a memory device (8.2.8.5)
> hw/cxl/device: Implement MMIO HDM decoding (8.2.5.12)
> acpi/cxl: Add _OSC implementation (9.14.2)
> tests/acpi: allow CEDT table addition
> acpi/cxl: Create the CEDT (9.14.1)
> hw/cxl/device: Add some trivial commands
> hw/cxl/device: Plumb real Label Storage Area (LSA) sizing
> hw/cxl/device: Implement get/set Label Storage Area (LSA)
> acpi/cxl: Introduce CFMWS structures in CEDT
> hw/cxl/component Add a dumb HDM decoder handler
> qtest/cxl: Add very basic sanity tests
>
> Jonathan Cameron (17):
> MAINTAINERS: Add entry for Compute Express Link Emulation
> tests/acpi: allow DSDT.viot table changes.
> tests/acpi: Add update DSDT.viot
> cxl: Machine level control on whether CXL support is enabled
> hw/cxl/component: Add utils for interleave parameter encoding/decoding
> hw/cxl/host: Add support for CXL Fixed Memory Windows.
> hw/pci-host/gpex-acpi: Add support for dsdt construction for pxb-cxl
> pci/pcie_port: Add pci_find_port_by_pn()
> CXL/cxl_component: Add cxl_get_hb_cstate()
> mem/cxl_type3: Add read and write functions for associated hostmem.
> cxl/cxl-host: Add memops for CFMWS region.
> arm/virt: Allow virt/CEDT creation
> hw/arm/virt: Basic CXL enablement on pci_expander_bridge instances
> pxb-cxl
> RFC: softmmu/memory: Add ops to memory_region_ram_init_from_file
> i386/pc: Enable CXL fixed memory windows
> qtest/acpi: Add reference CEDT tables.
> scripts/device-crash-test: Add exception for pxb-cxl
>
> MAINTAINERS | 7 +
> hw/Kconfig | 1 +
> hw/acpi/Kconfig | 5 +
> hw/acpi/cxl-stub.c | 12 +
> hw/acpi/cxl.c | 231 +++++++++++++
> hw/acpi/meson.build | 4 +-
> hw/arm/Kconfig | 1 +
> hw/arm/virt-acpi-build.c | 30 ++
> hw/arm/virt.c | 40 ++-
> hw/core/machine.c | 28 ++
> hw/cxl/Kconfig | 3 +
> hw/cxl/cxl-component-utils.c | 284 ++++++++++++++++
> hw/cxl/cxl-device-utils.c | 271 ++++++++++++++++
> hw/cxl/cxl-host-stubs.c | 22 ++
> hw/cxl/cxl-host.c | 263 +++++++++++++++
> hw/cxl/cxl-mailbox-utils.c | 483 ++++++++++++++++++++++++++++
> hw/cxl/meson.build | 12 +
> hw/i386/acpi-build.c | 98 ++++--
> hw/i386/pc.c | 57 +++-
> hw/mem/Kconfig | 5 +
> hw/mem/cxl_type3.c | 353 ++++++++++++++++++++
> hw/mem/meson.build | 1 +
> hw/meson.build | 1 +
> hw/pci-bridge/Kconfig | 5 +
> hw/pci-bridge/cxl_root_port.c | 231 +++++++++++++
> hw/pci-bridge/meson.build | 1 +
> hw/pci-bridge/pci_expander_bridge.c | 171 +++++++++-
> hw/pci-bridge/pcie_root_port.c | 6 +-
> hw/pci-host/gpex-acpi.c | 22 +-
> hw/pci/pci.c | 21 +-
> hw/pci/pcie_port.c | 25 ++
> include/hw/acpi/cxl.h | 28 ++
> include/hw/arm/virt.h | 1 +
> include/hw/boards.h | 2 +
> include/hw/cxl/cxl.h | 51 +++
> include/hw/cxl/cxl_component.h | 206 ++++++++++++
> include/hw/cxl/cxl_device.h | 272 ++++++++++++++++
> include/hw/cxl/cxl_pci.h | 160 +++++++++
> include/hw/pci/pci.h | 14 +
> include/hw/pci/pci_bridge.h | 20 ++
> include/hw/pci/pci_bus.h | 7 +
> include/hw/pci/pci_ids.h | 1 +
> include/hw/pci/pcie_port.h | 2 +
> qapi/machine.json | 15 +
> qemu-options.hx | 37 +++
> scripts/device-crash-test | 1 +
> softmmu/memory.c | 9 +
> softmmu/vl.c | 11 +
> tests/data/acpi/pc/CEDT | Bin 0 -> 36 bytes
> tests/data/acpi/q35/CEDT | Bin 0 -> 36 bytes
> tests/data/acpi/q35/DSDT.viot | Bin 9398 -> 9416 bytes
> tests/data/acpi/virt/CEDT | Bin 0 -> 36 bytes
> tests/qtest/cxl-test.c | 151 +++++++++
> tests/qtest/meson.build | 4 +
> 54 files changed, 3645 insertions(+), 41 deletions(-)
> create mode 100644 hw/acpi/cxl-stub.c
> create mode 100644 hw/acpi/cxl.c
> create mode 100644 hw/cxl/Kconfig
> create mode 100644 hw/cxl/cxl-component-utils.c
> create mode 100644 hw/cxl/cxl-device-utils.c
> create mode 100644 hw/cxl/cxl-host-stubs.c
> create mode 100644 hw/cxl/cxl-host.c
> create mode 100644 hw/cxl/cxl-mailbox-utils.c
> create mode 100644 hw/cxl/meson.build
> create mode 100644 hw/mem/cxl_type3.c
> create mode 100644 hw/pci-bridge/cxl_root_port.c
> create mode 100644 include/hw/acpi/cxl.h
> create mode 100644 include/hw/cxl/cxl.h
> create mode 100644 include/hw/cxl/cxl_component.h
> create mode 100644 include/hw/cxl/cxl_device.h
> create mode 100644 include/hw/cxl/cxl_pci.h
> create mode 100644 tests/data/acpi/pc/CEDT
> create mode 100644 tests/data/acpi/q35/CEDT
> create mode 100644 tests/data/acpi/virt/CEDT
> create mode 100644 tests/qtest/cxl-test.c
>
> --
> 2.32.0
- [PATCH v5 34/43] mem/cxl_type3: Add read and write functions for associated hostmem., (continued)
- [PATCH v5 34/43] mem/cxl_type3: Add read and write functions for associated hostmem., Jonathan Cameron, 2022/02/02
- [PATCH v5 35/43] cxl/cxl-host: Add memops for CFMWS region., Jonathan Cameron, 2022/02/02
- [PATCH v5 36/43] arm/virt: Allow virt/CEDT creation, Jonathan Cameron, 2022/02/02
- [PATCH v5 37/43] hw/arm/virt: Basic CXL enablement on pci_expander_bridge instances pxb-cxl, Jonathan Cameron, 2022/02/02
- [PATCH v5 38/43] RFC: softmmu/memory: Add ops to memory_region_ram_init_from_file, Jonathan Cameron, 2022/02/02
- [PATCH v5 39/43] hw/cxl/component Add a dumb HDM decoder handler, Jonathan Cameron, 2022/02/02
- [PATCH v5 40/43] i386/pc: Enable CXL fixed memory windows, Jonathan Cameron, 2022/02/02
- [PATCH v5 41/43] qtest/acpi: Add reference CEDT tables., Jonathan Cameron, 2022/02/02
- [PATCH v5 42/43] qtest/cxl: Add very basic sanity tests, Jonathan Cameron, 2022/02/02
- [PATCH v5 43/43] scripts/device-crash-test: Add exception for pxb-cxl, Jonathan Cameron, 2022/02/02
- Re: [PATCH v5 00/43] CXl 2.0 emulation Support,
Michael S. Tsirkin <=
- Re: [PATCH v5 00/43] CXl 2.0 emulation Support, Jonathan Cameron, 2022/02/07