qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 1/2] s390x/tcg: Implement Vector-Enhancements Facility 2 f


From: Richard Henderson
Subject: Re: [PATCH v1 1/2] s390x/tcg: Implement Vector-Enhancements Facility 2 for s390x
Date: Wed, 2 Mar 2022 22:58:45 -1000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0

On 3/2/22 17:22, David Miller wrote:
resolves: https://gitlab.com/qemu-project/qemu/-/issues/738

implements:
VECTOR LOAD ELEMENTS REVERSED               (VLER)
VECTOR LOAD BYTE REVERSED ELEMENTS          (VLBR)
VECTOR LOAD BYTE REVERSED ELEMENT           (VLEBRH, VLEBRF, VLEBRG)
VECTOR LOAD BYTE REVERSED ELEMENT AND ZERO  (VLLEBRZ)
VECTOR LOAD BYTE REVERSED ELEMENT AND REPLOCATE (VLBRREP)
VECTOR STORE ELEMENTS REVERSED              (VSTER)
VECTOR STORE BYTE REVERSED ELEMENTS         (VSTBR)
VECTOR STORE BYTE REVERSED ELEMENTS         (VSTEBRH, VSTEBRF, VSTEBRG)
VECTOR SHIFT LEFT DOUBLE BY BIT             (VSLD)
VECTOR SHIFT RIGHT DOUBLE BY BIT            (VSRD)
VECTOR STRING SEARCH                        (VSTRS)

modifies:
VECTOR FP CONVERT FROM FIXED                (VCFPS)
VECTOR FP CONVERT FROM LOGICAL              (VCFPL)
VECTOR FP CONVERT TO FIXED                  (VCSFP)
VECTOR FP CONVERT TO LOGICAL                (VCLFP)
VECTOR SHIFT LEFT                           (VSL)
VECTOR SHIFT RIGHT ARITHMETIC               (VSRA)
VECTOR SHIFT RIGHT LOGICAL                  (VSRL)

Signed-off-by: David Miller <dmiller423@gmail.com>

Too many changes in one patch.
You need to split these into smaller, logical units.

+/* VECTOR LOAD BYTE REVERSED ELEMENT AND ZERO */
+    F(0xe604, VLLEBRZ, VRX,   VE2, la2, 0, 0, 0, vllebrz, 0, IF_VEC)
+/* VECTOR LOAD BYTE REVERSED ELEMENTS */
+       F(0xe606, VLBR,    VRX,   VE2, la2, 0, 0, 0, vlbr, 0, IF_VEC)
+/* VECTOR LOAD ELEMENTS REVERSED */
+       F(0xe607, VLER,    VRX,   VE2, la2, 0, 0, 0, vler, 0, IF_VEC)

Tabs, and more later.

@@ -457,6 +457,9 @@ static DisasJumpType op_vlrep(DisasContext *s, DisasOps *o)
      return DISAS_NEXT;
  }
+
+
+
  static DisasJumpType op_vle(DisasContext *s, DisasOps *o)

Do not add pointless whitespace.

+static DisasJumpType op_vlebr(DisasContext *s, DisasOps *o)
+{
+    const uint8_t es = (1 == s->fields.op2) ? 1 : (1 ^ s->fields.op2);
+    const uint8_t enr = get_field(s, m3);
+    TCGv_i64 tmp;
+
+    if (es < ES_16 || es > ES_64 || !valid_vec_element(enr, es)) {
+        gen_program_exception(s, PGM_SPECIFICATION);
+        return DISAS_NORETURN;
+    }
+
+    tmp = tcg_temp_new_i64();
+    tcg_gen_qemu_ld_i64(tmp, o->addr1, get_mem_index(s), MO_TE | es);

Just use a little-endian load: MO_LE | es.
While we use MO_TE all over, it's no secret that it's always big-endian.

And everywhere else you do load then swap, or swap then store.

+}
+
+
+
+static DisasJumpType op_vsteb(DisasContext *s, DisasOps *o)

More care with spacing.

+static inline void s390_vec_reverse(S390Vector *vdst,
+                                    S390Vector *vsrc, uint8_t es)
+{
+    const uint8_t elems = 1 << (4 - es);
+    uint32_t enr;
+
+    for (enr = 0; enr < elems; enr++) {
+        switch (es) {
+        case MO_8:
+            s390_vec_write_element8(vdst, enr,
+                           s390_vec_read_element8(vsrc, 15 ^ enr));
+            break;
+        case MO_16:
+            s390_vec_write_element16(vdst, enr,
+                           s390_vec_read_element16(vsrc, 7 ^ enr));
+            break;
+        case MO_32:
+            s390_vec_write_element32(vdst, enr,
+                           s390_vec_read_element32(vsrc, 3 ^ enr));
+            break;
+        case MO_64:
+            s390_vec_write_element64(vdst, enr,
+                           s390_vec_read_element64(vsrc, 1 ^ enr));
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
+}

This seems likely to go wrong for vdst == vsrc.
In addition, swapping the order of elements is something that can be done in 
parallel.

    l = src[lo], h = src[hi];
    switch (es) {
    case MO_64:
        dst[hi] = l, dst[lo] = h;
        break;
    case MO_8:
        dst[hi] = bswap64(l);
        dst[lo] = bswap64(h);
        break;
    case MO_16:
        dst[hi] = hswap64(l);
        dst[lo] = hswap64(h);
        break;
    case MO_32:
        dst[hi] = wswap64(l);
        dst[hi] = wswap64(h);
        break;
    }

which, really, can all be generated inline.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]