[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] Huge TLB performance improvement
From: |
Daniel Jacobowitz |
Subject: |
Re: [Qemu-devel] [PATCH] Huge TLB performance improvement |
Date: |
Sat, 11 Nov 2006 20:10:35 -0500 |
User-agent: |
Mutt/1.5.13 (2006-08-11) |
On Sun, Nov 05, 2006 at 10:38:20AM -0500, Daniel Jacobowitz wrote:
> On Mon, Mar 06, 2006 at 02:59:29PM +0000, Thiemo Seufer wrote:
> > Hello All,
> >
> > this patch vastly improves TLB performance on MIPS, and probably also
> > on other architectures. I measured a Linux boot-shutdown cycle,
> > including userland init.
>
> Quoting the whole message since this is from March...
>
> I don't remember seeing any followup discussion of this patch, but I
> may have missed it. Thiemo's definitely right about "vastly". Is this
> patch appropriate, or would anyone care to suggest a more
> sophisticated data structure to avoid the full cache invalidate?
This patch is an even nicer alternative, I think. I benchmarked four
alternatives (several times each):
Straight qemu with my previously posted MIPS patches takes 6:13 to
start and reboot a MIPS userspace (through init, so lots of fork/exec).
Thiemo's patch, which flushes the whole jump buffer, cuts it to 1:40.
A patch which finds the entries which need to be flushed more
efficiently cuts it to 1:21.
A patch which flushes up to 1/32nd of the jump buffer indiscriminately
cuts it to 1:11-1:13.
Here's that last patch. It changes the hash function so that entries
from a particular page are always grouped together in tb_jmp_cache,
then finds the possibly two affected ranges and memsets them clear.
Thoughts? Is this acceptable, where else should it be tested besides
MIPS? I haven't fine-tuned the numbers; it currently allows for max 64
cached jump targets per target page, but that could be made higher or
lower.
--
Daniel Jacobowitz
CodeSourcery
---
cpu-defs.h | 5 +++++
exec-all.h | 12 +++++++++++-
exec.c | 15 +++++++--------
3 files changed, 23 insertions(+), 9 deletions(-)
Index: qemu/cpu-defs.h
===================================================================
--- qemu.orig/cpu-defs.h 2006-11-11 15:12:26.000000000 -0500
+++ qemu/cpu-defs.h 2006-11-11 15:12:33.000000000 -0500
@@ -80,6 +80,11 @@ typedef unsigned long ram_addr_t;
#define TB_JMP_CACHE_BITS 12
#define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
+#define TB_JMP_PAGE_BITS (TB_JMP_CACHE_BITS / 2)
+#define TB_JMP_PAGE_SIZE (1 << TB_JMP_PAGE_BITS)
+#define TB_JMP_ADDR_MASK (TB_JMP_PAGE_SIZE - 1)
+#define TB_JMP_PAGE_MASK (TB_JMP_ADDR_MASK << TB_JMP_PAGE_BITS)
+
#define CPU_TLB_BITS 8
#define CPU_TLB_SIZE (1 << CPU_TLB_BITS)
Index: qemu/exec-all.h
===================================================================
--- qemu.orig/exec-all.h 2006-11-11 15:12:26.000000000 -0500
+++ qemu/exec-all.h 2006-11-11 19:56:36.000000000 -0500
@@ -196,9 +196,19 @@ typedef struct TranslationBlock {
struct TranslationBlock *jmp_first;
} TranslationBlock;
+static inline unsigned int tb_jmp_cache_hash_page(target_ulong pc)
+{
+ target_ulong tmp;
+ tmp = pc ^ (pc >> (TARGET_PAGE_BITS - TB_JMP_PAGE_BITS));
+ return (tmp >> TB_JMP_PAGE_BITS) & TB_JMP_PAGE_MASK;
+}
+
static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc)
{
- return (pc ^ (pc >> TB_JMP_CACHE_BITS)) & (TB_JMP_CACHE_SIZE - 1);
+ target_ulong tmp;
+ tmp = pc ^ (pc >> (TARGET_PAGE_BITS - TB_JMP_PAGE_BITS));
+ return (((tmp >> TB_JMP_PAGE_BITS) & TB_JMP_PAGE_MASK) |
+ (tmp & TB_JMP_ADDR_MASK));
}
static inline unsigned int tb_phys_hash_func(unsigned long pc)
Index: qemu/exec.c
===================================================================
--- qemu.orig/exec.c 2006-11-11 15:12:26.000000000 -0500
+++ qemu/exec.c 2006-11-11 19:39:45.000000000 -0500
@@ -1299,14 +1299,13 @@ void tlb_flush_page(CPUState *env, targe
tlb_flush_entry(&env->tlb_table[0][i], addr);
tlb_flush_entry(&env->tlb_table[1][i], addr);
- for(i = 0; i < TB_JMP_CACHE_SIZE; i++) {
- tb = env->tb_jmp_cache[i];
- if (tb &&
- ((tb->pc & TARGET_PAGE_MASK) == addr ||
- ((tb->pc + tb->size - 1) & TARGET_PAGE_MASK) == addr)) {
- env->tb_jmp_cache[i] = NULL;
- }
- }
+ /* Discard jump cache entries for any tb which might potentially
+ overlap the flushed page. */
+ i = tb_jmp_cache_hash_page(addr - TARGET_PAGE_SIZE);
+ memset (&env->tb_jmp_cache[i], 0, TB_JMP_PAGE_SIZE * sizeof(tb));
+
+ i = tb_jmp_cache_hash_page(addr);
+ memset (&env->tb_jmp_cache[i], 0, TB_JMP_PAGE_SIZE * sizeof(tb));
#if !defined(CONFIG_SOFTMMU)
if (addr < MMAP_AREA_END)
- Re: [Qemu-devel] [PATCH] Huge TLB performance improvement, Daniel Jacobowitz, 2006/11/05
- Re: [Qemu-devel] [PATCH] Huge TLB performance improvement,
Daniel Jacobowitz <=
- Re: [Qemu-devel] [PATCH] Huge TLB performance improvement, Laurent Desnogues, 2006/11/12
- Re: [Qemu-devel] [PATCH] Huge TLB performance improvement, Thiemo Seufer, 2006/11/12
- Re: [Qemu-devel] [PATCH] Huge TLB performance improvement, Paul Brook, 2006/11/12
- Re: [Qemu-devel] [PATCH] Huge TLB performance improvement, Thiemo Seufer, 2006/11/12
- Re: [Qemu-devel] [PATCH] Huge TLB performance improvement, Paul Brook, 2006/11/12
- Re: [Qemu-devel] [PATCH] Huge TLB performance improvement, Daniel Jacobowitz, 2006/11/12
- Re: [Qemu-devel] [PATCH] Huge TLB performance improvement, Daniel Jacobowitz, 2006/11/12
- Re: [Qemu-devel] [PATCH] Huge TLB performance improvement, Thiemo Seufer, 2006/11/12
- Re: [Qemu-devel] [PATCH] Huge TLB performance improvement, Daniel Jacobowitz, 2006/11/12
- Re: [Qemu-devel] [PATCH] Huge TLB performance improvement, Daniel Jacobowitz, 2006/11/12