qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v5 11/35] target/arm: Implement SVE scatter stor


From: Peter Maydell
Subject: Re: [Qemu-devel] [PATCH v5 11/35] target/arm: Implement SVE scatter stores
Date: Mon, 25 Jun 2018 17:13:59 +0100

On 21 June 2018 at 02:53, Richard Henderson
<address@hidden> wrote:
> Signed-off-by: Richard Henderson <address@hidden>
> ---
>  target/arm/helper-sve.h    | 41 +++++++++++++++++++++
>  target/arm/sve_helper.c    | 62 ++++++++++++++++++++++++++++++++
>  target/arm/translate-sve.c | 74 ++++++++++++++++++++++++++++++++++++++
>  target/arm/sve.decode      | 39 ++++++++++++++++++++
>  4 files changed, 216 insertions(+)
> +/* Stores with a vector index.  */
> +
> +#define DO_ST1_ZPZ_S(NAME, TYPEI, FN)                                   \
> +void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm,       \
> +                  target_ulong base, uint32_t desc)                     \
> +{                                                                       \
> +    intptr_t i, oprsz = simd_oprsz(desc) / 8;                           \
> +    unsigned scale = simd_data(desc);                                   \
> +    uintptr_t ra = GETPC();                                             \
> +    uint32_t *d = vd; TYPEI *m = vm; uint8_t *pg = vg;                  \
> +    for (i = 0; i < oprsz; i++) {                                       \
> +        uint8_t pp = pg[H1(i)];                                         \
> +        if (pp & 0x01) {                                                \
> +            target_ulong off = (target_ulong)m[H4(i * 2)] << scale;     \
> +            FN(env, base + off, d[H4(i * 2)], ra);                      \
> +        }                                                               \
> +        if (pp & 0x10) {                                                \
> +            target_ulong off = (target_ulong)m[H4(i * 2 + 1)] << scale; \
> +            FN(env, base + off, d[H4(i * 2 + 1)], ra);                  \
> +        }                                                               \
> +    }                                                                   \
> +}

Why do we do two operations per loop here? Generally
we seem to do one operation per loop elsewhere.


> +static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn)
> +{
> +    /* Indexed by [xs][msz].  */
> +    static gen_helper_gvec_mem_scatter * const fn32[2][3] = {
> +        { gen_helper_sve_stbs_zsu,
> +          gen_helper_sve_sths_zsu,
> +          gen_helper_sve_stss_zsu, },
> +        { gen_helper_sve_stbs_zss,
> +          gen_helper_sve_sths_zss,
> +          gen_helper_sve_stss_zss, },
> +    };
> +    static gen_helper_gvec_mem_scatter * const fn64[3][4] = {

In the pseudocode xs is either 0 (zero-extend offset) or
1 (sign-extend offset), but here it can also be 2. A
comment noting that we've overloaded it to also indicate
whether we're dealing with a 32-bit or 64-bit offset might
help (at least I think that's what we're doing).

> +        { gen_helper_sve_stbd_zsu,
> +          gen_helper_sve_sthd_zsu,
> +          gen_helper_sve_stsd_zsu,
> +          gen_helper_sve_stdd_zsu, },
> +        { gen_helper_sve_stbd_zss,
> +          gen_helper_sve_sthd_zss,
> +          gen_helper_sve_stsd_zss,
> +          gen_helper_sve_stdd_zss, },
> +        { gen_helper_sve_stbd_zd,
> +          gen_helper_sve_sthd_zd,
> +          gen_helper_sve_stsd_zd,
> +          gen_helper_sve_stdd_zd, },
> +    };
> +    gen_helper_gvec_mem_scatter *fn;
> +
> +    if (a->esz < a->msz || (a->msz == 0 && a->scale)) {
> +        return false;
> +    }
> +    if (!sve_access_check(s)) {
> +        return true;
> +    }
> +    switch (a->esz) {
> +    case MO_32:
> +        fn = fn32[a->xs][a->msz];
> +        break;
> +    case MO_64:
> +        fn = fn64[a->xs][a->msz];
> +        break;
> +    default:
> +        g_assert_not_reached();
> +    }
> +    do_mem_zpz(s, a->rd, a->pg, a->rm, a->scale * a->msz,
> +               cpu_reg_sp(s, a->rn), fn);
> +    return true;
> +}

Otherwise
Reviewed-by: Peter Maydell <address@hidden>

thanks
-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]