qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC] target/arm: Implement SVE2 MATCH, NMATCH


From: Richard Henderson
Subject: Re: [PATCH RFC] target/arm: Implement SVE2 MATCH, NMATCH
Date: Tue, 14 Apr 2020 07:47:39 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1

On 4/13/20 4:42 PM, Stephen Long wrote:
> +#define DO_ZPZZ_CHAR_MATCH(NAME, TYPE, H, EQUALS)                            
> \
> +void HELPER(NAME)(void *vd, void *vn, void *vm, void *vg, uint32_t desc)     
> \
> +{                                                                            
> \
> +    intptr_t i, opr_sz = simd_oprsz(desc);                                   
> \
> +    for (i = 0; i < opr_sz; i += sizeof(TYPE)) {                             
> \
> +        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));                      
> \
> +        uint16_t *pd = (uint16_t *)(vd + H1_2(i >> 3));                      
> \
> +        *pd = (*pd & ~1) | ((0 & EQUALS) | (1 & !EQUALS));                   
> \
> +        if (pg & 1) {                                                        
> \

The important error here is that the predicate is not always the low bit.  When
operating on bytes, every bit of the predicate is significant.  When operating
on halfwords, every even bit of the predicate is significant.  In addition,
when operating on halfwords, every odd bit of the result predicate must be zero.

Which is why, generally, I have constructed the output predicate as we go.
See, for instance, DO_CMP_PPZZ.

> +            TYPE nn = *(TYPE *)(vn + H(i));                                  
> \
> +            for (intptr_t j = 0; j < 16; j += sizeof(TYPE)) {                
> \
> +                TYPE mm = *(TYPE *)(vm + H(i * 16 + j));                     
> \

mm needs to start at the beginning of the segment, which in this case is (i &
-16).  You don't need the elements of mm in any particular order (all of them
are significant), so you can drop the use of H() here.

Therefore the indexing for mm should be vm + (i & -16) + j.

> +                bool eq = nn == mm;                                          
> \
> +                if ((eq && EQUALS) || (!eq && !EQUALS)) {                    
> \
> +                    *pd = (*pd & ~1) | ((1 & EQUALS) | (0 & !EQUALS));       
> \
> +                }                                                            
> \

It might be handy to split out the inner loop to a helper function, as, while
the basic loop is ok, there are tricks that can improve it, so that we're
comparing 8 bytes at a time.


> +static bool do_sve2_zpzz_char_match(DisasContext *s, arg_rprr_esz *a,
> +                                    gen_helper_gvec_4 *fn)
> +{
> +    if (!dc_isar_feature(aa64_sve2, s)) {
> +        return false;
> +    }
> +    if (fn == NULL) {
> +        return false;
> +    }
> +    if (sve_access_check(s)) {
> +        unsigned vsz = vec_full_reg_size(s);
> +        unsigned psz = pred_full_reg_size(s);
> +        int dofs = pred_full_reg_offset(s, a->rd);
> +        int nofs = vec_full_reg_offset(s, a->rn);
> +        int mofs = vec_full_reg_offset(s, a->rm);
> +        int gofs = pred_full_reg_offset(s, a->pg);
> +
> +        /* Save a copy if the destination overwrites the guarding predicate 
> */
> +        int tofs = gofs;
> +        if (a->rd == a->pg) {
> +            tofs = offsetof(CPUARMState, vfp.preg_tmp);
> +            tcg_gen_gvec_mov(0, tofs, gofs, psz, psz);
> +        }
> +
> +        tcg_gen_gvec_4_ool(dofs, nofs, mofs, gofs, vsz, vsz, 0, fn);
> +        do_predtest(s, dofs, tofs, psz / 8);

You can avoid the copy and the predtest by using the iter_predtest_* functions
and returning the flags result directly from the helper.  Again, see 
DO_CMP_PPZZ.


r~



reply via email to

[Prev in Thread] Current Thread [Next in Thread]