[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v4 03/45] target/arm: Trap non-streaming usage when Streaming
From: |
Peter Maydell |
Subject: |
Re: [PATCH v4 03/45] target/arm: Trap non-streaming usage when Streaming SVE is active |
Date: |
Fri, 1 Jul 2022 12:06:36 +0100 |
On Tue, 28 Jun 2022 at 05:26, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> This new behaviour is in the ARM pseudocode function
> AArch64.CheckFPAdvSIMDEnabled, which applies to AArch32
> via AArch32.CheckAdvSIMDOrFPEnabled when the EL to which
> the trap would be delivered is in AArch64 mode.
>
> Given that ARMv9 drops support for AArch32 outside EL0, the trap EL
> detection ought to be trivially true, but the pseudocode still contains
> a number of conditions, and QEMU has not yet committed to dropping A32
> support for EL[12] when v9 features are present.
>
> Since the computation of SME_TRAP_NONSTREAMING is necessarily different
> for the two modes, we might as well preserve bits within TBFLAG_ANY and
> allocate separate bits within TBFLAG_A32 and TBFLAG_A64 instead.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> +# These patterns are taken from Appendix E1.1 of DDI0616 A.a,
> +# Arm Architecture Reference Manual Supplement,
> +# The Scalable Matrix Extension (SME), for Armv9-A
> +
> +{
> + [
> + OK 0-00 1110 0000 0001 0010 11-- ---- ---- # SMOV W|Xd,Vn.B[0]
> + OK 0-00 1110 0000 0010 0010 11-- ---- ---- # SMOV W|Xd,Vn.H[0]
> + OK 0100 1110 0000 0100 0010 11-- ---- ---- # SMOV Xd,Vn.S[0]
> + OK 0000 1110 0000 0001 0011 11-- ---- ---- # UMOV Wd,Vn.B[0]
> + OK 0000 1110 0000 0010 0011 11-- ---- ---- # UMOV Wd,Vn.H[0]
> + OK 0000 1110 0000 0100 0011 11-- ---- ---- # UMOV Wd,Vn.S[0]
> + OK 0100 1110 0000 1000 0011 11-- ---- ---- # UMOV Xd,Vn.D[0]
> + ]
> + FAIL 0--0 111- ---- ---- ---- ---- ---- ---- # Advanced SIMD vector
> operations
> +}
> +
> +{
> + [
> + OK 0101 1110 --1- ---- 11-1 11-- ---- ---- # FMULX/FRECPS/FRSQRTS
> (scalar)
> + OK 0101 1110 -10- ---- 00-1 11-- ---- ---- # FMULX/FRECPS/FRSQRTS
> (scalar, FP16)
> + OK 01-1 1110 1-10 0001 11-1 10-- ---- ---- # FRECPE/FRSQRTE/FRECPX
> (scalar)
> + OK 01-1 1110 1111 1001 11-1 10-- ---- ---- # FRECPE/FRSQRTE/FRECPX
> (scalar, FP16)
> + ]
> + FAIL 01-1 111- ---- ---- ---- ---- ---- ---- # Advanced SIMD
> single-element operations
> +}
> +
> +FAIL 0-00 110- ---- ---- ---- ---- ---- ---- # Advanced SIMD structure
> load/store
> +FAIL 1100 1110 ---- ---- ---- ---- ---- ---- # Advanced SIMD
> cryptography extensions
> +
> +# These are the "avoidance of doubt" final table of Illegal Advanced SIMD
> instructions
> +# We don't actually need to include these, as the default is OK.
> +# -001 111- ---- ---- ---- ---- ---- ---- # Scalar floating-point
> operations
> +# --10 110- ---- ---- ---- ---- ---- ---- # Load/store pair of FP
> registers
> +# --01 1100 ---- ---- ---- ---- ---- ---- # Load FP register
> (PC-relative literal)
> +# --11 1100 --0- ---- ---- ---- ---- ---- # Load/store FP register
> (unscaled imm)
> +# --11 1100 --1- ---- ---- ---- ---- --10 # Load/store FP register
> (register offset)
> +# --11 1101 ---- ---- ---- ---- ---- ---- # Load/store FP register
> (scaled imm)
Don't we need a FAIL line for the "FJCVTZS should be illegal" case ?
> +FAIL 0000 0100 --1- ---- 1010 ---- ---- ---- # ADR
> +FAIL 0000 0100 --1- ---- 1011 -0-- ---- ---- # FTSSEL, FEXPA
> +FAIL 0000 0101 --10 0001 100- ---- ---- ---- # COMPACT
> +FAIL 0010 0101 --01 100- 1111 000- ---0 ---- # RDFFR, RDFFRS
> +FAIL 0010 0101 --10 1--- 1001 ---- ---- ---- # WRFFR, SETFFR
> +FAIL 0100 0101 --0- ---- 1011 ---- ---- ---- # BDEP, BEXT, BGRP
> +FAIL 0100 0101 000- ---- 0110 1--- ---- ---- # PMULLB, PMULLT (128b
> result)
> +FAIL 0110 0100 --1- ---- 1110 01-- ---- ---- # FMMLA, BFMMLA
> +FAIL 0110 0101 --0- ---- 0000 11-- ---- ---- # FTSMUL
> +FAIL 0110 0101 --01 0--- 100- ---- ---- ---- # FTMAD
> +FAIL 0110 0101 --01 1--- 001- ---- ---- ---- # FADDA
> +FAIL 0100 0101 --0- ---- 1001 10-- ---- ---- # SMMLA, UMMLA, USMMLA
> +FAIL 0100 0101 --1- ---- 1--- ---- ---- ---- # SVE2 string/histo/crypto
> instructions
> +FAIL 1000 010- -00- ---- 10-- ---- ---- ---- # SVE2 32-bit gather NT
> load (vector+scalar)
> +FAIL 1000 010- -00- ---- 111- ---- ---- ---- # SVE 32-bit gather
> prefetch (vector+imm)
> +FAIL 1000 0100 0-1- ---- 0--- ---- ---- ---- # SVE 32-bit gather
> prefetch (scalar+vector)
> +FAIL 1000 010- -01- ---- 1--- ---- ---- ---- # SVE 32-bit gather load
> (vector+imm)
> +FAIL 1000 0100 0-0- ---- 0--- ---- ---- ---- # SVE 32-bit gather load
> byte (scalar+vector)
> +FAIL 1000 0100 1--- ---- 0--- ---- ---- ---- # SVE 32-bit gather load
> half (scalar+vector)
> +FAIL 1000 0101 0--- ---- 0--- ---- ---- ---- # SVE 32-bit gather load
> word (scalar+vector)
> +FAIL 1010 010- ---- ---- 011- ---- ---- ---- # SVE contiguous FF load
> (scalar+scalar)
> +FAIL 1010 010- ---1 ---- 101- ---- ---- ---- # SVE contiguous NF load
> (scalar+imm)
> +FAIL 1010 010- -10- ---- 000- ---- ---- ---- # SVE load & replicate 32
> bytes (scalar+scalar)
> +FAIL 1010 010- -100 ---- 001- ---- ---- ---- # SVE load & replicate 32
> bytes (scalar+imm)
> +FAIL 1100 010- ---- ---- ---- ---- ---- ---- # SVE 64-bit gather
> load/prefetch
> +FAIL 1110 010- -00- ---- 001- ---- ---- ---- # SVE2 64-bit scatter NT
> store (vector+scalar)
> +FAIL 1110 010- -10- ---- 001- ---- ---- ---- # SVE2 32-bit scatter NT
> store (vector+scalar)
> +FAIL 1110 010- ---- ---- 1-0- ---- ---- ---- # SVE scatter store
> (scalar+32-bit vector)
> +FAIL 1110 010- ---- ---- 101- ---- ---- ---- # SVE scatter store (misc)
> @@ -11312,6 +11338,21 @@ static CPUARMTBFlags rebuild_hflags_a32(CPUARMState
> *env, int fp_el,
> DP_TBFLAG_ANY(flags, PSTATE__IL, 1);
> }
>
> + /*
> + * The SME exception we are testing for is raised via
> + * AArch64.CheckFPAdvSIMDEnabled(), and for AArch32 this is called
> + * when EL1 is using A64 or EL2 using A64 and !TGE.
> + * See AArch32.CheckAdvSIMDOrFPEnabled().
> + */
> + if (el == 0
> + && FIELD_EX64(env->svcr, SVCR, SM)
> + && (!arm_is_el2_enabled(env)
> + || (arm_el_is_aa64(env, 2) && !(env->cp15.hcr_el2 & HCR_TGE)))
> + && arm_el_is_aa64(env, 1)
> + && !sme_fa64(env, el)) {
I can't get any of:
* the logic in the comment
* the logic in the C code
* the logic in the the pseudocode AArch32.CheckAdvSIMDOrFPEnabled()
which causes it to call AArch64.CheckFPEnabled()
to line up with each other.
The comment has:
* (EL1 A64) OR (EL2 A64 && !TGE)
The pseudocode has:
* (!TGE && EL1 A64) OR (TGE && EL2 A64 && EL1 A64)
[seems odd that it is checking the width of EL1 in the TGE case
but I haven't followed the logic through to find out why]
The C code here has:
* (!TGE && EL2 A64 && EL1 A64)
What am I missing ?
thanks
-- PMM
- Re: [PATCH v4 03/45] target/arm: Trap non-streaming usage when Streaming SVE is active,
Peter Maydell <=