[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 32/32] ppc: Speed up load/store multiple
From: |
David Gibson |
Subject: |
Re: [Qemu-devel] [PATCH 32/32] ppc: Speed up load/store multiple |
Date: |
Wed, 27 Jul 2016 12:47:13 +1000 |
User-agent: |
Mutt/1.6.2 (2016-07-01) |
On Wed, Jul 27, 2016 at 08:21:26AM +1000, Benjamin Herrenschmidt wrote:
> Use a single translate when not crossing a page boundary and avoid
> going through layers of helpers. MacOS uses those instructions
> a lot, so does OpenBIOS.
>
> Signed-off-by: Benjamin Herrenschmidt <address@hidden>
> ---
> target-ppc/mem_helper.c | 69
> +++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 69 insertions(+)
>
> diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c
> index da3f973..511079b 100644
> --- a/target-ppc/mem_helper.c
> +++ b/target-ppc/mem_helper.c
> @@ -53,8 +53,48 @@ static inline target_ulong addr_add(CPUPPCState *env,
> target_ulong addr,
> }
> }
>
> +/* Reduce the length so that addr + len doesn't cross a page boundary. */
> +static inline uint64_t adj_len_to_page(uint64_t len, uint64_t addr)
> +{
> +#ifndef CONFIG_USER_ONLY
> + if ((addr & ~TARGET_PAGE_MASK) + len - 1 >= TARGET_PAGE_SIZE) {
> + return -addr & ~TARGET_PAGE_MASK;
> + }
> +#endif
> + return len;
> +}
> +
> void helper_lmw(CPUPPCState *env, target_ulong addr, uint32_t reg)
> {
> + uint32_t *src;
> + uint64_t len, adjlen;
> +
> + if ((addr & 3)) {
> + goto fallback;
> + }
> + len = (32 - reg) << 2;
> + while (len) {
> + src = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, env->dmmu_idx);
> + if (!src) {
> + goto fallback;
> + }
> + adjlen = adj_len_to_page(len, addr);
> + len -= adjlen;
> +#if defined(HOST_WORDS_BIGENDIAN)
> + memcpy(&env->gpr[reg], src, adjlen);
> + reg += (adjlen >> 2);
> + addr = addr_add(env, addr, adjlen);
> +#else
> + while(adjlen) {
> + env->gpr[reg++] = bswap32(*(src++));
> + adjlen -= 4;
> + addr = addr_add(env, addr, 4);
> + }
> +#endif
Would it improve this any further to do the memcpy() unconditionally,
then byteswap the GPRs in-place for the LE host case?
> + }
> + return;
> +
> + fallback:
> for (; reg < 32; reg++) {
> if (needs_byteswap(env)) {
> env->gpr[reg] = bswap32(cpu_ldl_data_ra(env, addr, GETPC()));
> @@ -67,6 +107,35 @@ void helper_lmw(CPUPPCState *env, target_ulong addr,
> uint32_t reg)
>
> void helper_stmw(CPUPPCState *env, target_ulong addr, uint32_t reg)
> {
> + uint32_t *dst;
> + uint64_t len, adjlen;
> +
> + if ((addr & 3)) {
> + goto fallback;
> + }
> + len = (32 - reg) << 2;
> + while (len) {
> + dst = tlb_vaddr_to_host(env, addr, MMU_DATA_STORE, env->dmmu_idx);
> + if (!dst) {
> + goto fallback;
> + }
> + adjlen = adj_len_to_page(len, addr);
> + len -= adjlen;
> +#if defined(HOST_WORDS_BIGENDIAN)
> + memcpy(dst, &env->gpr[reg], adjlen);
> + reg += (adjlen >> 2);
> + addr = addr_add(env, addr, adjlen);
> +#else
> + while(adjlen) {
> + *(dst++) = bswap32(env->gpr[reg++]);
> + adjlen -= 4;
> + addr = addr_add(env, addr, 4);
> + }
> +#endif
> + }
> + return;
> +
> + fallback:
> for (; reg < 32; reg++) {
> if (needs_byteswap(env)) {
> cpu_stl_data_ra(env, addr, bswap32((uint32_t)env->gpr[reg]),
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature
- [Qemu-devel] [PATCH 31/32] ppc: load/store multiple and string insns don't do LE, (continued)
- [Qemu-devel] [PATCH 31/32] ppc: load/store multiple and string insns don't do LE, Benjamin Herrenschmidt, 2016/07/26
- [Qemu-devel] [PATCH 27/32] ppc: Fix CFAR updates, Benjamin Herrenschmidt, 2016/07/26
- [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl, Benjamin Herrenschmidt, 2016/07/26
- [Qemu-devel] [PATCH 29/32] ppc: Don't set access_type on all load/stores on hash64, Benjamin Herrenschmidt, 2016/07/26
- [Qemu-devel] [PATCH 30/32] ppc: Use a helper to generate "LE unsupported" alignment interrupts, Benjamin Herrenschmidt, 2016/07/26
- [Qemu-devel] [PATCH 22/32] ppc: Don't update NIP if not taking alignment exceptions, Benjamin Herrenschmidt, 2016/07/26
- [Qemu-devel] [PATCH 26/32] ppc: Speed up dcbz, Benjamin Herrenschmidt, 2016/07/26
- [Qemu-devel] [PATCH 32/32] ppc: Speed up load/store multiple, Benjamin Herrenschmidt, 2016/07/26
- Re: [Qemu-devel] [PATCH 32/32] ppc: Speed up load/store multiple,
David Gibson <=
- [Qemu-devel] [PATCH 14/32] ppc: Don't update NIP in lmw/stmw/icbi, Benjamin Herrenschmidt, 2016/07/26
- [Qemu-devel] [PATCH 23/32] ppc: Don't update NIP in dcbz and lscbx, Benjamin Herrenschmidt, 2016/07/26
- [Qemu-devel] [PATCH 17/32] ppc: Fix source NIP on SLB related interrupts, Benjamin Herrenschmidt, 2016/07/26
- [Qemu-devel] [PATCH 25/32] ppc: Handle unconditional (always/never) traps at translation time, Benjamin Herrenschmidt, 2016/07/26
- Re: [Qemu-devel] [PATCH 01/32] ppc: Fix fault PC reporting for lve*/stve* VMX instructions, David Gibson, 2016/07/26