qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 00/43] CXl 2.0 emulation Support


From: Michael S. Tsirkin
Subject: Re: [PATCH v5 00/43] CXl 2.0 emulation Support
Date: Fri, 4 Feb 2022 09:27:08 -0500

On Fri, Feb 04, 2022 at 09:03:27AM -0500, Michael S. Tsirkin wrote:
> On Wed, Feb 02, 2022 at 02:09:54PM +0000, Jonathan Cameron wrote:
> > Changes since v4:
> > https://lore.kernel.org/linux-cxl/20220124171705.10432-1-Jonathan.Cameron@huawei.com/
> > 
> > Note documentation patch that Alex requested to follow.
> > I don't want to delay getting this out as Alex mentioned possibly
> > having time to continue reviewing in latter part of this week.
> > 
> > Issues identified by CI / Alex Bennée
> > - Stubs added for hw/cxl/cxl-host and hw/acpi/cxl plus related meson
> >   changes to use them as necessary.
> > - Drop uid from cxl-test (result of last minute change in v4 that was not
> >   carried through to the test)
> > - Fix naming clash with field name ERROR which on some arches is defined
> >   and results in the string being replaced with 0 in some of the
> >   register field related defines.  Call it ERR instead.
> > - Fix type issue around mr->size by using 64 bit acessor functions.
> > - Add a new patch to exclude pxb-cxl from device-crash-test in similar
> >   fashion to pxb.
> > 
> > CI tests now passing with exception of checkpatch which has what
> > I think is a false positive and build-oss-fuzz which keeps timing out.
> > https://gitlab.com/jic23/qemu/-/pipelines/460109208
> > There were a few tweaks to patch descriptions after I pushed that
> > out (I missed a few RB from Alex).
> 
> There's an RFC patch that needs review from core memory maintainers,
> so I guess not all of it is for merge just yet?
> Is there any way we can start applying this patchset gradually?

For example, pick up patches 1-13 for now? They seem to be ready ...

> 
> > Other changes (mostly from Alex's review)
> > - Change component register handling to now report UNIMP and return 0
> >   for 8 byte registers as we currently don't implement any of them.
> >   Note that this means we need a kernel fix:
> >   
> > https://lore.kernel.org/linux-cxl/20220201153437.2873-1-Jonathan.Cameron@huawei.com/
> > - Drop majority of the macros used in defining mailbox handlers in
> >   favour of written out code.
> > - Use REG64 where appropriate. This was introduced whilst this set
> >   has been underdevelopment so I missed it.
> > - Clarify some register access options wrt to CXL 2.0 Errata F4.
> > - Change timestamp to qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL)
> > - Use typed enums to enforce types of function arguements.
> > - Default to cxl being off in machine_class_init() removing
> >   need to set it to off in machines where there is no support as yet.
> > - Add Alex's RB where given.
> > 
> > Looking in particular for:
> > * Review of the PCI interactions
> > * x86 and ARM machine interactions (particularly the memory maps)
> > * Review of the interleaving approach - is the basic idea
> >   acceptable?
> > * Review of the command line interface.
> > * CXL related review welcome but much of that got reviewed
> >   in earlier versions and hasn't changed substantially.
> > 
> > Big TODOs:
> > 
> > * Interleave boundary issues. I haven't yet solved this but didn't
> >   want to futher delay the review of the rest of the series.
> > 
> > * Volatile memory devices (easy but it's more code so left for now).
> > * Switch support. Linux kernel support is under review currently,
> >   so there is now something to test against.
> > * Hotplug?  May not need much but it's not tested yet!
> > * More tests and tighter verification that values written to hardware
> >   are actually valid - stuff that real hardware would check.
> > * Testing, testing and more testing.  I have been running a basic
> >   set of ARM and x86 tests on this, but there is always room for
> >   more tests and greater automation.
> > * CFMWS flags as requested by Ben.
> > 
> > Why do we want QEMU emulation of CXL?
> > 
> > As Ben stated in V3, QEMU support has been critical to getting OS
> > software written given lack of availability of hardware supporting the
> > latest CXL features (coupled with very high demand for support being
> > ready in a timely fashion). What has become clear since Ben's v3
> > is that situation is a continuous one. Whilst we can't talk about
> > them yet, CXL 3.0 features and OS support have been prototyped on
> > top of this support and a lot of the ongoing kernel work is being
> > tested against these patches. The kernel CXL mocking code allows
> > some forms of testing, but QEMU provides a more versatile and
> > exensible platform.
> > 
> > Other features on the qemu-list that build on these include PCI-DOE
> > /CDAT support from the Avery Design team further showing how this
> > code is useful. Whilst not directly related this is also the test
> > platform for work on PCI IDE/CMA + related DMTF SPDM as CXL both
> > utilizes and extends those technologies and is likely to be an early
> > adopter.
> > Refs:
> > CMA Kernel: 
> > https://lore.kernel.org/all/20210804161839.3492053-1-Jonathan.Cameron@huawei.com/
> > CMA Qemu: 
> > https://lore.kernel.org/qemu-devel/1624665723-5169-1-git-send-email-cbrowy@avery-design.com/
> > DOE Qemu: 
> > https://lore.kernel.org/qemu-devel/1623329999-15662-1-git-send-email-cbrowy@avery-design.com/
> > 
> > As can be seen there is non trivial interaction with other areas of
> > Qemu, particularly PCI and keeping this set up to date is proving
> > a burden we'd rather do without :)
> > 
> > Ben mentioned a few other good reasons in v3:
> > https://lore.kernel.org/qemu-devel/20210202005948.241655-1-ben.widawsky@intel.com/
> > 
> > The evolution of this series perhaps leave it in a less than
> > entirely obvious order and that may get tidied up in future postings.
> > I'm also open to this being considered in bite sized chunks.  What
> > we have here is about what you need for it to be useful for testing
> > currently kernel code.  Note the kernel code is moving fast so
> > since v4, some features have been introduced we don't yet support in
> > QEMU (e.g. use of the PCIe serial number extended capability).
> > 
> > All comments welcome.
> > 
> > qemu-system-aarch64 -M virt,gic-version=3,cxl=on \
> >  -m 4g,maxmem=8G,slots=8 \
> >  ...
> >  -object 
> > memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest.raw,size=256M
> >  \
> >  -object 
> > memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M
> >  \
> >  -object 
> > memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M
> >  \
> >  -object 
> > memory-backend-file,id=cxl-mem4,share=on,mem-path=/tmp/cxltest4.raw,size=256M
> >  \
> >  -object 
> > memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M \
> >  -object 
> > memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M \
> >  -object 
> > memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M \
> >  -object 
> > memory-backend-file,id=cxl-lsa4,share=on,mem-path=/tmp/lsa4.raw,size=256M \
> >  -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \
> >  -device pxb-cxl,bus_nr=222,bus=pcie.0,id=cxl.2 \
> >  -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \
> >  -device 
> > cxl-type3,bus=root_port13,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0,size=256M
> >  \
> >  -device cxl-rp,port=1,bus=cxl.1,id=root_port14,chassis=0,slot=3 \
> >  -device 
> > cxl-type3,bus=root_port14,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem1,size=256M
> >  \
> >  -device cxl-rp,port=0,bus=cxl.2,id=root_port15,chassis=0,slot=5 \
> >  -device 
> > cxl-type3,bus=root_port15,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem2,size=256M
> >  \
> >  -device cxl-rp,port=1,bus=cxl.2,id=root_port16,chassis=0,slot=6 \
> >  -device 
> > cxl-type3,bus=root_port16,memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3,size=256M
> >  \
> >  -cxl-fixed-memory-window targets=cxl.1,size=4G,interleave-granularity=8k \
> >  -cxl-fixed-memory-window 
> > targets=cxl.1,targets=cxl.2,size=4G,interleave-granularity=8k
> > 
> > First CFMWS suitable for up to 2 way interleave, the second for 4 way (2 way
> > at host level and 2 way at the host bridge).
> > targets=<range of pxb-cxl uids> , multiple entries if range is disjoint.
> > 
> > With the v5.17-rc1 + patch series listed below.
> > 
> >  cd /sys/bus/cxl/devices/
> >  region=$(cat decoder0.1/create_region)
> >  echo $region  > decoder0.1/create_region
> >  ls -lh
> >  
> >  //Note the order of devices and adjust the following to make sure they
> >  //are in order across the 4 root ports.  Easy to do in a tool, but
> >  //not easy to paste in a cover letter.
> > 
> >  cd region0.1\:0
> >  echo 4 > interleave_ways
> >  echo mem2 > target0
> >  echo mem3 > target1
> >  echo mem0 > target2
> >  echo mem1 > target3
> >  echo $((1024<<20)) > size
> >  echo 4096 > interleave_granularity
> >  echo region0.1:0 > /sys/bus/cxl/drivers/cxl_region/bind
> > 
> > Tested with devmem2 and files with known content.
> > Kernel tree is mainline + (I based on 5.17-rc1)
> > [PATCH] cxl/regs: Fix size of CXL Capabilty Header Register
> > https://lore.kernel.org/linux-cxl/20220201182934.jjvavjsf4h7oqngv@intel.com/T/#t
> > 
> > [PATCH v3 00/40] CXL.mem Topology Discovery and Hotplug Support
> > https://lore.kernel.org/linux-cxl/164298411792.3018233.7493009997525360044.stgit@dwillia2-desk3.amr.corp.intel.com/
> > Note that series has a lot of v4/v5 patches are replies but b4 does
> > a good job of pulling out the latest.
> > 
> > [PATCH 0/2] cxl/port: Robustness fixes for decoder enumeration
> > https://lore.kernel.org/linux-cxl/164317463887.3438644.4087819721493502301.stgit@dwillia2-desk3.amr.corp.intel.com/
> > 
> > [PATCH 0/4] Unify meaning of interleave attributes
> > https://lore.kernel.org/linux-cxl/20220127212911.127741-1-ben.widawsky@intel.com/
> > 
> > [PATCH v3 00/14] CXL Region driver
> > https://lore.kernel.org/linux-cxl/20220128002707.391076-1-ben.widawsky@intel.com/
> > 
> > What follows is a first attempt at explaining how all these components
> > fit together.  I'll write up some formal documentation shortly.
> > 
> > Memory Address Map for CXL elements.  Note where exactly these regions
> > appear is Arch and platform dependent.  
> > 
> >   Base somewhere far up in the Host PA map.
> > _______________________________
> > |                              |
> > | CXL Host Bridge 0 Registers  | 
> > | CXL Host Bridge 1 Registers  |
> > |       ...                    |  This bit is normal MMIO register space.
> > | CXL Host bridge N registers  |  including programmable interleave 
> > decoders 
> > |______________________________|  for interleave across root ports.
> > |                              |
> >               ....     
> > |                              |
> > |______________________________|
> > |                              |
> > |   CFMW 0,                    |  Note that there can be multiple regions
> > |   Interleave 2 way, targets  |  of memory within this 1TB which can be
> > |   Hostbridge 0, Hostbridge 1 |  interleaved differently: in the host 
> > bridges
> > |   Granularity 16KiB, 1TB     |  across root ports or in switches below 
> > the root.
> > |______________________________|  ports
> > |                              |
> > |   CFMW 1,                    |
> > |   Interleave 1 way, target   |
> > |   Hostbridge 0, 512GiB       | 
> > |______________________________|
> > etc for all interleave combinations
> > configured, or built in to the
> > system before any generic software
> > sees it.
> > 
> > System Topology considering CFMW 0 only to keep this simple.
> > x marks the match in each decoder level.
> > Switches have more interleave decoders and other features
> > that we haven't implemented yet in QEMU.
> > 
> >                 Address Read to CFMW0 base + N
> >               _________________|________________
> >              |                                  |
> >              |  Host interconnect               |  
> >              |  Configured to route CFM         |
> >              |  memory access to particular HB  |
> >              |_____x____________________________|
> >                    |                     |
> >              Interleave Decoder          |
> >              Matches this HB             |  
> >                    |                     |
> >             _______|__________      _____|____________
> >            |                  |    |                  |
> >            | CXL HB 0         |    | CXL HB 1         | Only exist in PCI 
> > (mostly)
> >            | HB IntLv Decoder |    | HB IntLv Decoder | via ACPI description
> >            |  PCI Root Bus 0c |    | PCI Root Bus 0d  |
> >            |x_________________|    |__________________| In CXL have MMIO
> >             |                |       |               |  at location given 
> > in CEDT
> >             |                |       |               |  CHBS entry (ACPI)
> > ____________|___   __________|__   __|_________   ___|_________ 
> > |  Root Port 0  | | Root Port 1 | | Root Port 2| | Root Port 3 |
> > |  Appears in   | | Appears in  | | Appears in | | Appear in   |
> > |  PCI topology | | PCI Topology| | PCI Topo   | | PCI Topo    |
> > |  As 0c:00.0   | | as 0c:01.0  | | as de:00.0 | | as de:01.0  |
> > |_______________| |_____________| |____________| |_____________|
> >       |                  |               |              |
> >       |                  |               |              |
> >  _____|_________   ______|______   ______|_____   ______|_______
> > |     x         | |             | |            | |              |
> > | CXL Type3 0   | | CXL Type3 1 | | CXL type3 2| | CLX Type 3 3 |
> > |               | |             | |            | |              |
> > | PMEM0(Vol LSA)| | PMEM1 (...) | | PMEM2 (...)| | PMEM3 (...)  |
> > | Decoder to go | |             | |            | |              |
> > | from host PA  | | PCI 0e:00.0 | | PCI df:00.0| | PCI e0:00.0  |
> > | to device PA  | |             | |            | |              | 
> > | PCI as 0d:00.0| |             | |            | |              |
> > |_______________| |_____________| |____________| |______________|
> > 
> >    Backed by        Backed by       Backed by       Backed by
> >     file 0           file 1           file 2          file 3
> > 
> > LSA backed by additional files for each device (not yet supported)
> > 
> > So currently we have decoders as follows for each interleaved access.
> > 1) CFMW decoder - fixed config so forms part of qemu command line.
> > 2) Host bridge decoders - programmable decoders that the system
> >    software will program either based on user command or based
> >    on info from the Label Storage Area (not yet emulated)
> > 3) Type 3 device decoders. Down to here the address used is the
> >    Host PA.  These decoders convert to the local device PA
> >    (in simple case - drop some bits in the middle of the address)
> > 
> > Future patches will add decoders in switch upstream ports making
> > the above diagram have another layer between root ports and
> > the memory devices.
> > 
> > Note, we've focused for now on Persistent Memory devices as they are seen
> > as an early and important usecase (and are the most complex one).
> > But it should be straight forward to add volatile memory
> > support and indeed that would be backed by RAM.
> > 
> > lspci -tv for above shows
> > 
> > -+-[0000:00]-+-00.0 Red Hat, Inc. QEMU PCIe Host Bridge (this is the cxl 
> > PXB)f
> >  |           \-OTHER STUFF
> >  +-[0000:0c]-+-00.0-[0d]----00.0  Intel Corporation Device 0d93
> >  |           \-01.0-[0e]----00.0  Intel Corporation Device 0d93
> >  \-[0000:de]-+-00.0-[df]----00.0  Intel Corporation Device 0d93
> >              \-01.0-[e0]----00.0  Intel Corporation Device 0d93
> > 
> > Where those Intel parts are the type 3 devices.
> > 
> > All comments welcome!
> > 
> > Particular thanks to Alex Bennée for his review of v4.
> > 
> > Thanks,
> > 
> > Jonathan
> > 
> > Ben Widawsky (26):
> >   hw/pci/cxl: Add a CXL component type (interface)
> >   hw/cxl/component: Introduce CXL components (8.1.x, 8.2.5)
> >   hw/cxl/device: Introduce a CXL device (8.2.8)
> >   hw/cxl/device: Implement the CAP array (8.2.8.1-2)
> >   hw/cxl/device: Implement basic mailbox (8.2.8.4)
> >   hw/cxl/device: Add memory device utilities
> >   hw/cxl/device: Add cheap EVENTS implementation (8.2.9.1)
> >   hw/cxl/device: Timestamp implementation (8.2.9.3)
> >   hw/cxl/device: Add log commands (8.2.9.4) + CEL
> >   hw/pxb: Use a type for realizing expanders
> >   hw/pci/cxl: Create a CXL bus type
> >   hw/pxb: Allow creation of a CXL PXB (host bridge)
> >   acpi/pci: Consolidate host bridge setup
> >   hw/cxl/component: Implement host bridge MMIO (8.2.5, table 142)
> >   hw/cxl/rp: Add a root port
> >   hw/cxl/device: Add a memory device (8.2.8.5)
> >   hw/cxl/device: Implement MMIO HDM decoding (8.2.5.12)
> >   acpi/cxl: Add _OSC implementation (9.14.2)
> >   tests/acpi: allow CEDT table addition
> >   acpi/cxl: Create the CEDT (9.14.1)
> >   hw/cxl/device: Add some trivial commands
> >   hw/cxl/device: Plumb real Label Storage Area (LSA) sizing
> >   hw/cxl/device: Implement get/set Label Storage Area (LSA)
> >   acpi/cxl: Introduce CFMWS structures in CEDT
> >   hw/cxl/component Add a dumb HDM decoder handler
> >   qtest/cxl: Add very basic sanity tests
> > 
> > Jonathan Cameron (17):
> >   MAINTAINERS: Add entry for Compute Express Link Emulation
> >   tests/acpi: allow DSDT.viot table changes.
> >   tests/acpi: Add update DSDT.viot
> >   cxl: Machine level control on whether CXL support is enabled
> >   hw/cxl/component: Add utils for interleave parameter encoding/decoding
> >   hw/cxl/host: Add support for CXL Fixed Memory Windows.
> >   hw/pci-host/gpex-acpi: Add support for dsdt construction for pxb-cxl
> >   pci/pcie_port: Add pci_find_port_by_pn()
> >   CXL/cxl_component: Add cxl_get_hb_cstate()
> >   mem/cxl_type3: Add read and write functions for associated hostmem.
> >   cxl/cxl-host: Add memops for CFMWS region.
> >   arm/virt: Allow virt/CEDT creation
> >   hw/arm/virt: Basic CXL enablement on pci_expander_bridge instances
> >     pxb-cxl
> >   RFC: softmmu/memory: Add ops to memory_region_ram_init_from_file
> >   i386/pc: Enable CXL fixed memory windows
> >   qtest/acpi: Add reference CEDT tables.
> >   scripts/device-crash-test: Add exception for pxb-cxl
> > 
> >  MAINTAINERS                         |   7 +
> >  hw/Kconfig                          |   1 +
> >  hw/acpi/Kconfig                     |   5 +
> >  hw/acpi/cxl-stub.c                  |  12 +
> >  hw/acpi/cxl.c                       | 231 +++++++++++++
> >  hw/acpi/meson.build                 |   4 +-
> >  hw/arm/Kconfig                      |   1 +
> >  hw/arm/virt-acpi-build.c            |  30 ++
> >  hw/arm/virt.c                       |  40 ++-
> >  hw/core/machine.c                   |  28 ++
> >  hw/cxl/Kconfig                      |   3 +
> >  hw/cxl/cxl-component-utils.c        | 284 ++++++++++++++++
> >  hw/cxl/cxl-device-utils.c           | 271 ++++++++++++++++
> >  hw/cxl/cxl-host-stubs.c             |  22 ++
> >  hw/cxl/cxl-host.c                   | 263 +++++++++++++++
> >  hw/cxl/cxl-mailbox-utils.c          | 483 ++++++++++++++++++++++++++++
> >  hw/cxl/meson.build                  |  12 +
> >  hw/i386/acpi-build.c                |  98 ++++--
> >  hw/i386/pc.c                        |  57 +++-
> >  hw/mem/Kconfig                      |   5 +
> >  hw/mem/cxl_type3.c                  | 353 ++++++++++++++++++++
> >  hw/mem/meson.build                  |   1 +
> >  hw/meson.build                      |   1 +
> >  hw/pci-bridge/Kconfig               |   5 +
> >  hw/pci-bridge/cxl_root_port.c       | 231 +++++++++++++
> >  hw/pci-bridge/meson.build           |   1 +
> >  hw/pci-bridge/pci_expander_bridge.c | 171 +++++++++-
> >  hw/pci-bridge/pcie_root_port.c      |   6 +-
> >  hw/pci-host/gpex-acpi.c             |  22 +-
> >  hw/pci/pci.c                        |  21 +-
> >  hw/pci/pcie_port.c                  |  25 ++
> >  include/hw/acpi/cxl.h               |  28 ++
> >  include/hw/arm/virt.h               |   1 +
> >  include/hw/boards.h                 |   2 +
> >  include/hw/cxl/cxl.h                |  51 +++
> >  include/hw/cxl/cxl_component.h      | 206 ++++++++++++
> >  include/hw/cxl/cxl_device.h         | 272 ++++++++++++++++
> >  include/hw/cxl/cxl_pci.h            | 160 +++++++++
> >  include/hw/pci/pci.h                |  14 +
> >  include/hw/pci/pci_bridge.h         |  20 ++
> >  include/hw/pci/pci_bus.h            |   7 +
> >  include/hw/pci/pci_ids.h            |   1 +
> >  include/hw/pci/pcie_port.h          |   2 +
> >  qapi/machine.json                   |  15 +
> >  qemu-options.hx                     |  37 +++
> >  scripts/device-crash-test           |   1 +
> >  softmmu/memory.c                    |   9 +
> >  softmmu/vl.c                        |  11 +
> >  tests/data/acpi/pc/CEDT             | Bin 0 -> 36 bytes
> >  tests/data/acpi/q35/CEDT            | Bin 0 -> 36 bytes
> >  tests/data/acpi/q35/DSDT.viot       | Bin 9398 -> 9416 bytes
> >  tests/data/acpi/virt/CEDT           | Bin 0 -> 36 bytes
> >  tests/qtest/cxl-test.c              | 151 +++++++++
> >  tests/qtest/meson.build             |   4 +
> >  54 files changed, 3645 insertions(+), 41 deletions(-)
> >  create mode 100644 hw/acpi/cxl-stub.c
> >  create mode 100644 hw/acpi/cxl.c
> >  create mode 100644 hw/cxl/Kconfig
> >  create mode 100644 hw/cxl/cxl-component-utils.c
> >  create mode 100644 hw/cxl/cxl-device-utils.c
> >  create mode 100644 hw/cxl/cxl-host-stubs.c
> >  create mode 100644 hw/cxl/cxl-host.c
> >  create mode 100644 hw/cxl/cxl-mailbox-utils.c
> >  create mode 100644 hw/cxl/meson.build
> >  create mode 100644 hw/mem/cxl_type3.c
> >  create mode 100644 hw/pci-bridge/cxl_root_port.c
> >  create mode 100644 include/hw/acpi/cxl.h
> >  create mode 100644 include/hw/cxl/cxl.h
> >  create mode 100644 include/hw/cxl/cxl_component.h
> >  create mode 100644 include/hw/cxl/cxl_device.h
> >  create mode 100644 include/hw/cxl/cxl_pci.h
> >  create mode 100644 tests/data/acpi/pc/CEDT
> >  create mode 100644 tests/data/acpi/q35/CEDT
> >  create mode 100644 tests/data/acpi/virt/CEDT
> >  create mode 100644 tests/qtest/cxl-test.c
> > 
> > -- 
> > 2.32.0




reply via email to

[Prev in Thread] Current Thread [Next in Thread]