[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] SH: Fix movca.l/ocbi emulation.
From: |
Vladimir Prus |
Subject: |
Re: [Qemu-devel] SH: Fix movca.l/ocbi emulation. |
Date: |
Wed, 1 Apr 2009 00:58:06 +0400 |
User-agent: |
KMail/1.11.90 (Linux/2.6.24-24-generic; KDE/4.2.65; i686; svn-944167; 2009-03-23) |
On Wednesday 14 January 2009 23:45:11 Edgar E. Iglesias wrote:
> On Thu, Dec 11, 2008 at 09:34:30PM +0300, Vladimir Prus wrote:
> >
> > This patch fixes the emulation of movca.l and ocbi
> > instructions. movca.l is documented to allocate a cache
> > like, and write into it. ocbi invalidates a cache line.
> > So, given code like:
> >
> > asm volatile("movca.l r0, @%0\n\t"
> > "movca.l r0, @%1\n\t"
> > "ocbi @%0\n\t"
> > "ocbi @%1" : :
> > "r" (a0), "r" (a1));
> >
> > Nothing is actually written to memory. Code like this can be
> > found in arch/sh/mm/cache-sh4.c and is used to flush the cache.
> >
> > Current QEMU implements movca.l the same way as ordinary move,
> > and the code above just corrupts memory. Doing full cache emulation
> > is out of question, so this patch implements a hack. Stores
> > done by movca.l are delayed. If we execute an instructions that is
> > neither movca.l nor ocbi, we flush all pending stores. If we execute
> > ocbi, we look at pending stores and delete a store to the invalidated
> > address. This appears to work fairly well in practice.
> >
> > - Volodya
>
> Hello and thanks for the patch.
>
> If you wonder why the late sudden interest in this, see the following
> thread :)
> http://lists.gnu.org/archive/html/qemu-devel/2009-01/msg00460.html
Hi Edgar,
apologies for belated reply.
> I've got a few comments on your patch. Lets start with the general ones.
>
> It would be good if the patch handled when the data cache gets disabled or
> put into writethrough mode. There are also memory areas that are not
> cache:able (TLB feature). Supporting the latter can get messy for very
> little win, IMO not needed.
I am afraid I might not be the best person to accurately catch all cases
where cache might be enabled or disabled. KAWASAKI-san has posted a function
that can be used to check if caching is enabled, in:
http://article.gmane.org/gmane.comp.emulators.qemu/36348
Does that one address your suggestion? If so, I can incorporate that function,
and checking for it.
> In the other thread discussing this issue, I raised conerns about the
> delayed stores beeing issued on insns where they normaly wouldn't. Let me
> give you an example of the kind of thing I'm thinking of.
>
> movca
> --- <-- Page boundary, broken TB.
> mov <-- I TLB happens to miss. SH jumps to TLB miss handler.
> xxx <- I TLB refill handlers first insn issues delayed store
> and the delayed store happens to fault!
>
> That scenario could cause a D-TLB fault inside a I-TLB refill handler. Some
> archs don't handle that. If it's a real problem for sh (which I beleive it
> is) you could turn the stake and make the movca into a load + store, saving
> the overwritten memory value. ocbi could then just bring back the old state.
> I beleive that approach would not break the exception rules of sh.
I think your approach will indeed be more reliable, so I've reworked my patch
accordingly.
> > +void helper_do_stores(void)
> > +{
> > + store_request_t *current = env->store_requests;
> > +
> > + while(current)
> > + {
> > + uint32_t a = current->address, v = current->value;
> > + store_request_t *next = current->next;
> > + free (current);
> > + env->store_requests = current = next;
> > + if (current == 0)
> > + env->store_request_tail = &(env->store_requests);
> > +
> > + stl_data(a, v);
>
> Shouldn't you remove the delayed store record _after_ stl() returns?
> Otherwise you might loose delayed stores in case of TLB exceptions or?
I recall that I have explicitly used this ordering, but I no longer
remember exactly why. Anyway, this is not an issue after patch is
revised as you suggested above.
>
>
>
> > + }
> > +}
> > +
> > +void helper_ocbi(uint32_t address)
> > +{
> > + store_request_t **current = &(env->store_requests);
> > + while (*current)
> > + {
> > + if ((*current)->address == address)
> > + {
>
>
> Nitpick:
> It may not be very significant but you can in an easy way catch a few more
> movca+ocbi use cases by not comparing line offsets. I.e, just mask off
> the last five bits from the compared addresses to zap all recorded movca's
> made to the same cache-line ocbi is invalidating.
I did so.
> > static void _decode_opc(DisasContext * ctx)
> > {
> > + /* This code tries to make movcal emulation sufficiently
> > + accurate for Linux purposes. This instruction writes
> > + memory, and prior to that, always allocates a cache line.
> > + It is used in two contexts:
> > + - in memcpy, where data is copied in blocks, the first write
> > + of to a block uses movca.l. I presume this is because writing
> > + all data into cache, and then having the data sent into memory
> > + later, via store buffer, is faster than, in case of write-through
> > + cache configuration, to wait for memory write on each store.
>
> I dont think this is the reason for using movca. In fact, according to the
> documentation I've got, movca behaves like a normal mov for caches in
> writethrough mode. I beleive the reason is to avoid the line fetch in case
> the store from a normal mov misses the cache in writeback mode. Anyway,
> there's no reason to speculate, IMO we can just remove the comment on why
> and just leave the note about it beeing used for fast block copies.
OK.
I am attaching a revised patch -- what do you think?
- Volodya
movcal-ocbi.diff
Description: Text Data
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [Qemu-devel] SH: Fix movca.l/ocbi emulation.,
Vladimir Prus <=