[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-commits] [qemu/qemu] 6b8b62: cputlb: Make store_helper less fragil
From: |
Peter Maydell |
Subject: |
[Qemu-commits] [qemu/qemu] 6b8b62: cputlb: Make store_helper less fragile to compiler... |
Date: |
Sun, 06 Sep 2020 06:15:28 -0700 |
Branch: refs/heads/master
Home: https://github.com/qemu/qemu
Commit: 6b8b622e87e2cb4b22113f2bdebf18c78f5905ee
https://github.com/qemu/qemu/commit/6b8b622e87e2cb4b22113f2bdebf18c78f5905ee
Author: Richard Henderson <richard.henderson@linaro.org>
Date: 2020-09-03 (Thu, 03 Sep 2020)
Changed paths:
M accel/tcg/cputlb.c
Log Message:
-----------
cputlb: Make store_helper less fragile to compiler optimizations
This has no functional change.
The current function structure is:
inline QEMU_ALWAYSINLINE
store_memop() {
switch () {
...
default:
qemu_build_not_reached();
}
}
inline QEMU_ALWAYSINLINE
store_helper() {
...
if (span_two_pages_or_io) {
...
helper_ret_stb_mmu();
}
store_memop();
}
helper_ret_stb_mmu() {
store_helper();
}
Whereas GCC will generate an error at compile-time when an always_inline
function is not inlined, Clang does not. Nor does Clang prioritize the
inlining of always_inline functions. Both of these are arguably bugs.
Both `store_memop` and `store_helper` need to be inlined and allow
constant propogations to eliminate the `qemu_build_not_reached` call.
However, if the compiler instead chooses to inline helper_ret_stb_mmu
into store_helper, then store_helper is now self-recursive and the
compiler is no longer able to propagate the constant in the same way.
This does not produce at current QEMU head, but was reproducible
at v4.2.0 with `clang-10 -O2 -fexperimental-new-pass-manager`.
The inline recursion problem can be fixed solely by marking
helper_ret_stb_mmu as noinline, so the compiler does not make an
incorrect decision about which functions to inline.
In addition, extract store_helper_unaligned as a noinline subroutine
that can be shared by all of the helpers. This saves about 6k code
size in an optimized x86_64 build.
Reported-by: Shu-Chun Weng <scw@google.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Commit: e7e8f33fb603c3bfa0479d7d924f2ad676a84317
https://github.com/qemu/qemu/commit/e7e8f33fb603c3bfa0479d7d924f2ad676a84317
Author: Stephen Long <steplong@quicinc.com>
Date: 2020-09-03 (Thu, 03 Sep 2020)
Changed paths:
M tcg/tcg-op-gvec.c
Log Message:
-----------
tcg: Fix tcg gen for vectorized absolute value
The fallback inline expansion for vectorized absolute value,
when the host doesn't support such an insn was flawed.
E.g. when a vector of bytes has all elements negative, mask
will be 0xffff_ffff_ffff_ffff. Subtracting mask only adds 1
to the low element instead of all elements becase -mask is 1
and not 0x0101_0101_0101_0101.
Signed-off-by: Stephen Long <steplong@quicinc.com>
Message-Id: <20200813161818.190-1-steplong@quicinc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Commit: 4ca3d09cd9b2046984966ef430cca4572ae0a925
https://github.com/qemu/qemu/commit/4ca3d09cd9b2046984966ef430cca4572ae0a925
Author: Richard Henderson <richard.henderson@linaro.org>
Date: 2020-09-03 (Thu, 03 Sep 2020)
Changed paths:
M softmmu/cpus.c
Log Message:
-----------
softmmu/cpus: Only set parallel_cpus for SMP
Do not set parallel_cpus if there is only one cpu instantiated.
This will allow tcg to use serial code to implement atomics.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Commit: 6a17646176e011ddc463a2870a64c7aaccfe9c50
https://github.com/qemu/qemu/commit/6a17646176e011ddc463a2870a64c7aaccfe9c50
Author: Richard Henderson <richard.henderson@linaro.org>
Date: 2020-09-03 (Thu, 03 Sep 2020)
Changed paths:
M tcg/tcg-op-gvec.c
Log Message:
-----------
tcg: Eliminate one store for in-place 128-bit dup_mem
Do not store back to the exact memory from which we just loaded.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Commit: fe4b0b5bfa96c38ad1cad0689a86cca9f307e353
https://github.com/qemu/qemu/commit/fe4b0b5bfa96c38ad1cad0689a86cca9f307e353
Author: Richard Henderson <richard.henderson@linaro.org>
Date: 2020-09-03 (Thu, 03 Sep 2020)
Changed paths:
M tcg/tcg-op-gvec.c
Log Message:
-----------
tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem
We already support duplication of 128-bit blocks. This extends
that support to 256-bit blocks. This will be needed by SVE2.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Commit: 227de21ed0759e275a469394af72c999d0134bb5
https://github.com/qemu/qemu/commit/227de21ed0759e275a469394af72c999d0134bb5
Author: Peter Maydell <peter.maydell@linaro.org>
Date: 2020-09-05 (Sat, 05 Sep 2020)
Changed paths:
M accel/tcg/cputlb.c
M softmmu/cpus.c
M tcg/tcg-op-gvec.c
Log Message:
-----------
Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20200903' into staging
Improve inlining in cputlb.c.
Fix vector abs fallback.
Only set parallel_cpus for SMP.
Add vector dupm for 256-bit elements.
# gpg: Signature made Thu 03 Sep 2020 22:38:25 BST
# gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg: issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>"
[full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F
* remotes/rth/tags/pull-tcg-20200903:
tcg: Implement 256-bit dup for tcg_gen_gvec_dup_mem
tcg: Eliminate one store for in-place 128-bit dup_mem
softmmu/cpus: Only set parallel_cpus for SMP
tcg: Fix tcg gen for vectorized absolute value
cputlb: Make store_helper less fragile to compiler optimizations
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Compare: https://github.com/qemu/qemu/compare/8ca019b9c9ff...227de21ed075
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Qemu-commits] [qemu/qemu] 6b8b62: cputlb: Make store_helper less fragile to compiler...,
Peter Maydell <=