[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH 00/55] target/arm: First slice of MVE implementation
From: |
Peter Maydell |
Subject: |
[PATCH 00/55] target/arm: First slice of MVE implementation |
Date: |
Mon, 7 Jun 2021 17:57:26 +0100 |
This patchseries provides an initial slice of the MVE
implementation. (MVE is "vector instructions for M-profile", also
known as Helium).
This is not complete support by a long way -- it covers only about 35%
of the decode patterns for MVE, and it implements only the slow-path
"we need predication, drop out to a helper function" versions of
insns. I send it out for two reasons:
* if there's something I need to change about the general structure
or the way I'm implementing insns, I want to know now rather than
after I've implemented the other two thirds of the ISA
* if I hold onto the whole patchset until I've got a complete MVE
implementation it'll be 150+ patches, 10000 lines of code, and
a nightmare to code review
The series covers:
* framework for MVE decode, including infrastructure for
handling predication, PSR.ECI, etc
* tail-predication forms of low-overhead-loop insns (LCTP, WLSTP, LETP)
* basic (non-gather) loads and stores
* pretty much all the integer 2-operand vector and scalar insns
* most of the integer 1-operand insns
* a handful of other insns
(Unfortunately the v8M Arm ARM does not provide a nice neatly
separated list of encodings the way the SVE2 XML does. I ended up
just pulling all the decode patterns out of the Arm ARM insn
descriptions and then hand-sorting them into what looked like common
formats. So the insns implemented aren't following a 100% logical
order.)
As noted above, the implementation here is purely the slow-path
fully-generic "call helpers that can handle predication". I do
want to implement a fast-path for "we know we have no predication,
so we can generate inline vector code", but I'd like to do that
as a series of followup patches once the main MVE code has landed.
That will (a) make it easier to review, I hope (b) mean we get to
"at least functional" MVE quicker and (c) allow people to bisect
any regressions to the "add fastpath" patch.
Almost nothing in this patchseries is "live code", because no CPU sets
the ID register bits to turn on MVE. The exception is the handling of
PSR.ECI/ICI, which is enabled at least as far as the ICI bits go for
M-profile CPUs (thus fixing the missing corner-case requirement that
trying to execute a non-continuable insn with non-zero ICI should
fault).
My view is that if these patches get through code review we're better
off with them in upstream git rather than outside it; open to
arguments to the contrary.
Patch 1 is RTH's recently posted tcg_remove_ops_after() patch,
which we need for the PSR.ECI handling (which indeed is the
justification for having that new function in the first place).
You can also get this patchset here:
https://git.linaro.org/people/peter.maydell/qemu-arm.git mve-drop-1
thanks
-- PMM
Peter Maydell (54):
target/arm: Enable FPSCR.QC bit for MVE
target/arm: Handle VPR semantics in existing code
target/arm: Add handling for PSR.ECI/ICI
target/arm: Let vfp_access_check() handle late NOCP checks
target/arm: Implement MVE LCTP
target/arm: Implement MVE WLSTP insn
target/arm: Implement MVE DLSTP
target/arm: Implement MVE LETP insn
target/arm: Add framework for MVE decode
target/arm: Implement MVE VLDR/VSTR (non-widening forms)
target/arm: Implement widening/narrowing MVE VLDR/VSTR insns
target/arm: Implement MVE VCLZ
target/arm: Implement MVE VCLS
bitops.h: Provide hswap32(), hswap64(), wswap64() swapping operations
target/arm: Implement MVE VREV16, VREV32, VREV64
target/arm: Implement MVE VMVN (register)
target/arm: Implement MVE VABS
target/arm: Implement MVE VNEG
target/arm: Implement MVE VDUP
target/arm: Implement MVE VAND, VBIC, VORR, VORN, VEOR
target/arm: Implement MVE VADD, VSUB, VMUL
target/arm: Implement MVE VMULH
target/arm: Implement MVE VRMULH
target/arm: Implement MVE VMAX, VMIN
target/arm: Implement MVE VABD
target/arm: Implement MVE VHADD, VHSUB
target/arm: Implement MVE VMULL
target/arm: Implement MVE VMLALDAV
target/arm: Implement MVE VMLSLDAV
include/qemu/int128.h: Add function to create Int128 from int64_t
target/arm: Implement MVE VRMLALDAVH, VRMLSLDAVH
target/arm: Implement MVE VADD (scalar)
target/arm: Implement MVE VSUB, VMUL (scalar)
target/arm: Implement MVE VHADD, VHSUB (scalar)
target/arm: Implement MVE VBRSR
target/arm: Implement MVE VPST
target/arm: Implement MVE VQADD and VQSUB
target/arm: Implement MVE VQDMULH and VQRDMULH (scalar)
target/arm: Implement MVE VQDMULL scalar
target/arm: Implement MVE VQDMULH, VQRDMULH (vector)
target/arm: Implement MVE VQADD, VQSUB (vector)
target/arm: Implement MVE VQSHL (vector)
target/arm: Implement MVE VQRSHL
target/arm: Implement MVE VSHL insn
target/arm: Implement MVE VRSHL
target/arm: Implement MVE VQDMLADH and VQRDMLADH
target/arm: Implement MVE VQDMLSDH and VQRDMLSDH
target/arm: Implement MVE VQDMULL (vector)
target/arm: Implement MVE VRHADD
target/arm: Implement MVE VADC, VSBC
target/arm: Implement MVE VCADD
target/arm: Implement MVE VHCADD
target/arm: Implement MVE VADDV
target/arm: Make VMOV scalar <-> gpreg beatwise for MVE
Richard Henderson (1):
tcg: Introduce tcg_remove_ops_after
include/qemu/bitops.h | 29 +
include/qemu/int128.h | 10 +
include/tcg/tcg.h | 1 +
target/arm/helper-mve.h | 357 +++++++++
target/arm/helper.h | 2 +
target/arm/internals.h | 11 +
target/arm/translate-a32.h | 4 +
target/arm/translate.h | 19 +
target/arm/mve.decode | 261 +++++++
target/arm/t32.decode | 15 +-
target/arm/m_helper.c | 54 +-
target/arm/mve_helper.c | 1343 +++++++++++++++++++++++++++++++++
target/arm/sve_helper.c | 20 -
target/arm/translate-m-nocp.c | 16 +-
target/arm/translate-mve.c | 865 +++++++++++++++++++++
target/arm/translate-vfp.c | 152 +++-
target/arm/translate.c | 301 +++++++-
target/arm/vfp_helper.c | 3 +-
tcg/tcg.c | 13 +
target/arm/meson.build | 3 +
20 files changed, 3408 insertions(+), 71 deletions(-)
create mode 100644 target/arm/helper-mve.h
create mode 100644 target/arm/mve.decode
create mode 100644 target/arm/mve_helper.c
create mode 100644 target/arm/translate-mve.c
--
2.20.1
- [PATCH 00/55] target/arm: First slice of MVE implementation,
Peter Maydell <=
- [PATCH 01/55] tcg: Introduce tcg_remove_ops_after, Peter Maydell, 2021/06/07
- [PATCH 03/55] target/arm: Handle VPR semantics in existing code, Peter Maydell, 2021/06/07
- [PATCH 02/55] target/arm: Enable FPSCR.QC bit for MVE, Peter Maydell, 2021/06/07
- [PATCH 05/55] target/arm: Let vfp_access_check() handle late NOCP checks, Peter Maydell, 2021/06/07
- [PATCH 04/55] target/arm: Add handling for PSR.ECI/ICI, Peter Maydell, 2021/06/07