[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v5 02/35] target/arm: Implement SVE Contiguous L
From: |
Alex Bennée |
Subject: |
Re: [Qemu-devel] [PATCH v5 02/35] target/arm: Implement SVE Contiguous Load, first-fault and no-fault |
Date: |
Wed, 27 Jun 2018 12:37:30 +0100 |
User-agent: |
mu4e 1.1.0; emacs 26.1.50 |
Richard Henderson <address@hidden> writes:
> On 06/26/2018 05:52 AM, Alex Bennée wrote:
>>> +#define DO_LDFF1(PART, FN, TYPEE, TYPEM, H) \
>>> +static void do_sve_ldff1##PART(CPUARMState *env, void *vd, void *vg, \
>>> + target_ulong addr, intptr_t oprsz, \
>>> + bool first, uintptr_t ra) \
>>> +{ \
>>> + intptr_t i = 0; \
>>> + do { \
>>> + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
>>> + do { \
>>> + TYPEM m = 0; \
>>> + if (pg & 1) { \
>>> + if (!first && \
>>> + page_check_range(addr, sizeof(TYPEM), PAGE_READ)) { \
>>> + record_fault(env, i, oprsz); \
>>> + return; \
>>> + } \
>>> + m = FN(env, addr, ra); \
>>> + first = false; \
>>> + } \
>>> + *(TYPEE *)(vd + H(i)) = m; \
>>> + i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \
>>> + addr += sizeof(TYPEM); \
>>> + } while (i & 15); \
>>> + } while (i < oprsz); \
>>> +}
>>> \
>> So I noticed that the disassembly of these two functions is mostly
>> parameter pushing and popping. Is there a case to be made to use the
>> __flatten__ approach and see how the compiler unrolls it all?
>
> Em... for the most part the functions being called are not inlinable,
> being defined in accel/tcg/.
*sigh* I guess. It's a shame because the numbers get more disappointing:
12:13:48 address@hidden:~/l/q/q/aarch64-linux-user] review/rth-sve-v5(+26/-1) +
./qemu-aarch64 ./tests/simd-memcpy libc intreg intpair simdreg simdpair sve
libc, 248298053, 4228 kb/s
intreg, 646085220, 1623 kb/s
intpair, 369350825, 2841 kb/s
simdreg, 1422096252, 737 kb/s
simdpair, 1369635566, 765 kb/s
sve, 2646179942, 396 kb/s
and the above example doesn't have the cost of page_check_range. I guess
this isn't something that could be improved until other architectures had a
similar predicated load solution we could use in generated code. Helpers
are always going to suck here :-/
Anyway my boy-racer disappointments aside:
Reviewed-by: Alex Bennée <address@hidden>
--
Alex Bennée
- [Qemu-devel] [PATCH v5 00/35] target/arm SVE patches, Richard Henderson, 2018/06/20
- [Qemu-devel] [PATCH v5 03/35] target/arm: Implement SVE Memory Contiguous Store Group, Richard Henderson, 2018/06/20
- [Qemu-devel] [PATCH v5 04/35] target/arm: Implement SVE load and broadcast quadword, Richard Henderson, 2018/06/20
- [Qemu-devel] [PATCH v5 06/35] target/arm: Implement SVE floating-point arithmetic (predicated), Richard Henderson, 2018/06/20
- [Qemu-devel] [PATCH v5 05/35] target/arm: Implement SVE integer convert to floating-point, Richard Henderson, 2018/06/20