[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC 00/20] Do away with TB retranslation
From: |
Aurelien Jarno |
Subject: |
Re: [Qemu-devel] [RFC 00/20] Do away with TB retranslation |
Date: |
Sun, 13 Sep 2015 23:00:53 +0200 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
On 2015-09-10 19:48, Aurelien Jarno wrote:
> On 2015-09-01 22:51, Richard Henderson wrote:
> > I've been looking at this problem off and on for the last week or so,
> > prompted by the sparc performance work. Although I havn't been able
> > to get a proper sparc64 guest install working, I see the exact same
> > problem with a mips guest.
> >
> > On alpha or x86, which seem to perform well, perf numbers for the
> > executable have about 30% of the execution time spent in cpu_exec.
> > For mips, on the other hand, we spend about 30% of the time in
> > routines related to tcg (re-)translation.
>
> Indeed the problem happens on CPUs which implement the MMU as a
> "software assisted TLB" (or any other marketing name), as opposed to
> hardware page walk MMU. They can hold a limited number of TLB entry
> at a given time, and require the OS to do the page walk to refill the
> TLB. For that an exception is generated, and the faulting address has
> to be determined. That's were the TB retranslation takes place, and
> that's why it happens a lot more on these CPUS.
>
> A few years ago, I measured about 45% of the TB translation actually
> being retranslation for mips and 60% for SH4 for a standard workload.
> For a comparison, these value around 1% on i386 and around 5% on ARM.
>
> That's why each time we add an optimization to the optimize, we get
> faster code, but we might loose because it takes longer to generate.
>
> > Aurelien has a patch in his own branches that attempts to mitigate this
> > on mips by shadow caching more tlb entries. While this does improve
> > performace a bit, it employs a linear search through a large buffer,
> > with the effect of 30-ish % perf numbers for r4k_map_address.
> > (One could probably improve things by hashing the data in that array,
> > rather than a linear search, but...)
>
> Yes, that is just a workaround and probably highly workload dependent,
> that's why I never submitted it.
>
> > In the past we've talked about getting rid of retranslation entirely.
> > It's clever, but it certainly has its share of problems. I gave it
> > a go this weekend.
>
> Really great that you have been able to implement that.
>
> > The following isn't quite right. It fails to boot on sparc even with
> > our tiny test kernel. It also triggers an abort on mips, eventually.
> > But it's able to get all the way through to a prompt, and in the
> > process I can see that perf results are quite different -- much more
> > like results I see for alpha.
> >
> > Thoughts on the approach?
>
> It looks like the approach we discussed with Paolo back in June:
>
> http://lists.nongnu.org/archive/html/qemu-devel/2015-06/msg04885.html
>
> For me it looks like the good way to proceed, we just have to take care
> that the informations to store do not take too much space compared to
> the actual translated code.
>
> I'll give a look and a test asap.
I haven't really reviewed the code yet, but I have been able to test
your tcg-search-2 branch.
First of all I have tested half of the targets (alpha, arm, cris, i386,
mips, ppc, s390x, sh4 and sparc), and I haven't noticed any regression.
They now have more than 50 hours of uptime, some of them have been
building stuff most of the time, so they are quite stable. That said
I have only tested your branch on an x86-64 host, and it might be a
good idea to test it in one or two different host architectures (I put
that on my todo list, but no promise there).
On the performance side, I have done real measurements only on i386 and
mips. On i386, I haven't seen any measurable difference. On mips, the
boot time is unchanged, but then some workloads are quite faster. The
best I have measured is on perl code, with a x2.4 improvements, while
on an average workload, the gain is around x1.5.
With all that said, you can get:
Tested-by: Aurelien Jarno <address@hidden>
I hope to give you the corresponding reviewed-by in the next days.
Aurelien
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
address@hidden http://www.aurel32.net
- Re: [Qemu-devel] [RFC 00/20] Do away with TB retranslation, (continued)
- Re: [Qemu-devel] [RFC 00/20] Do away with TB retranslation, Dennis Luehring, 2015/09/10
- Re: [Qemu-devel] [RFC 00/20] Do away with TB retranslation, Artyom Tarasenko, 2015/09/10
- Re: [Qemu-devel] [RFC 00/20] Do away with TB retranslation, Dennis Luehring, 2015/09/10
- Re: [Qemu-devel] [RFC 00/20] Do away with TB retranslation, Artyom Tarasenko, 2015/09/10
- Re: [Qemu-devel] [RFC 00/20] Do away with TB retranslation, Dennis Luehring, 2015/09/10
- Re: [Qemu-devel] [RFC 00/20] Do away with TB retranslation, Paolo Bonzini, 2015/09/10
- Re: [Qemu-devel] [RFC 00/20] Do away with TB retranslation, Dennis Luehring, 2015/09/10
- Re: [Qemu-devel] [RFC 00/20] Do away with TB retranslation, Artyom Tarasenko, 2015/09/10
Re: [Qemu-devel] [RFC 00/20] Do away with TB retranslation, Peter Maydell, 2015/09/10
Re: [Qemu-devel] [RFC 00/20] Do away with TB retranslation, Aurelien Jarno, 2015/09/10
- Re: [Qemu-devel] [RFC 00/20] Do away with TB retranslation,
Aurelien Jarno <=
Re: [Qemu-devel] [RFC 00/20] Do away with TB retranslation, Alex Bennée, 2015/09/10