|
From: | Richard Henderson |
Subject: | Re: [Qemu-devel] [PATCH 5/6] target-ppc: add lxvb16x and lxvh8x |
Date: | Mon, 8 Aug 2016 10:57:13 +0530 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 |
On 08/07/2016 11:06 PM, Nikunj A Dadhania wrote:
+#define LXV(name, access, swap, type, elems) \ +uint64_t helper_##name(CPUPPCState *env, \ + target_ulong addr) \ +{ \ + type r[elems] = {0}; \ + int i, index, bound, step; \ + if (msr_le) { \ + index = elems - 1; \ + bound = -1; \ + step = -1; \ + } else { \ + index = 0; \ + bound = elems; \ + step = 1; \ + } \ + \ + for (i = index; i != bound; i += step) { \ + if (needs_byteswap(env)) { \ + r[i] = swap(access(env, addr, GETPC())); \ + } else { \ + r[i] = access(env, addr, GETPC()); \ + } \ + addr = addr_add(env, addr, sizeof(type)); \ + } \ + return *((uint64_t *)r); \ +}
This looks more complicated than necessary. (1) In big-endian mode, surely this simplifies to two 64-bit big-endian loads.(2) In little-endian mode, the overhead of accessing memory surely dominates, and therefore we should perform two 64-bit loads and manipulate the data after.
AFAICS, this is easiest done by requesting two 64-bit *big-endian* loads, and then swapping bytes. E.g.
uint64_t helper_bswap16x4(uint64_t x) { uint64_t m = 0x00ff00ff00ff00ffull; return ((x & m) << 8) | ((x >> 8) & m); } uint64_t helper_bswap32x2(uint64_t x) { return deposit64(bswap32(x >> 32), 32, 32, bswap32(x)); } tcg_gen_qemu_ld_i64(dest, addr, MO_BEQ, s->mem_index); if (ctx->le_mode) { gen_helper_bswap16x4(dest, dest); } r~
[Prev in Thread] | Current Thread | [Next in Thread] |