Re: [Qemu-devel] [PATCH v3 20/20] tcg-arm: Convert to CONFIG_QEMU_LDST

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 20/20] tcg-arm: Convert to CONFIG_QEMU_LDST_O

From:	Richard Henderson
Subject:	Re: [Qemu-devel] [PATCH v3 20/20] tcg-arm: Convert to CONFIG_QEMU_LDST_OPTIMIZATION
Date:	Thu, 28 Mar 2013 10:46:46 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130311 Thunderbird/17.0.4

On 03/28/2013 09:44 AM, Peter Maydell wrote:
>> +        /* Prior to that the assembler uses mov r0, r0.  Unlike the nop
>> +           above, this is guaranteed to consume execution resources.  */
> 
> Guaranteed by who? Catching this case in the decoder and treating it
> exactly like NOP is a perfectly legal implementation.
> (For that matter there's nothing restricting an implementation of
> the architectural NOP from tying up every execution resource on
> the core for 500 cycles.)

Hmph, I could have sworn I saw language exactly like that in the AARM,
but I can't find it anymore.  I do see a note about not using NOP in
timing loops in A8.8.119.

As for timing on real hardware, I can make a loop like

1:      subs    r0, r0, #1
        mov     r0, r0
        mov     r0, r0
        mov     r0, r0
        mov     r0, r0
        mov     r0, r0
        mov     r0, r0
        bne     1b

runs in 7 cycles on Cortex-A15, whereas the same loop with nops runs in 6.  Of
course, changing to "mov r1, r1" so that we don't conflict with the subs in the
first cycle also runs in 6 cycles.  So it's all about finding a nop that
doesn't have a RAW conflict with the previous insn.

I don't have any other ARM hw readily available.

r~

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] [PATCH v3 13/20] tcg-arm: Cleanup tcg_out_goto_label, (continued)
- [Qemu-devel] [PATCH v3 13/20] tcg-arm: Cleanup tcg_out_goto_label, Richard Henderson, 2013/03/28
- [Qemu-devel] [PATCH v3 14/20] tcg-arm: Cleanup goto_tb handling, Richard Henderson, 2013/03/28
  - Re: [Qemu-devel] [PATCH v3 14/20] tcg-arm: Cleanup goto_tb handling, Aurelien Jarno, 2013/03/28
    - Re: [Qemu-devel] [PATCH v3 14/20] tcg-arm: Cleanup goto_tb handling, Richard Henderson, 2013/03/28
- [Qemu-devel] [PATCH v3 15/20] tcg-arm: Cleanup most primitive load store subroutines, Richard Henderson, 2013/03/28
- [Qemu-devel] [PATCH v3 16/20] tcg-arm: Fix local stack frame, Richard Henderson, 2013/03/28
- [Qemu-devel] [PATCH v3 17/20] tcg-arm: Split out tcg_out_tlb_read, Richard Henderson, 2013/03/28
- [Qemu-devel] [PATCH v3 18/20] tcg-arm: Improve scheduling of tcg_out_tlb_read, Richard Henderson, 2013/03/28
- [Qemu-devel] [PATCH v3 20/20] tcg-arm: Convert to CONFIG_QEMU_LDST_OPTIMIZATION, Richard Henderson, 2013/03/28
  - Re: [Qemu-devel] [PATCH v3 20/20] tcg-arm: Convert to CONFIG_QEMU_LDST_OPTIMIZATION, Peter Maydell, 2013/03/28
    - Re: [Qemu-devel] [PATCH v3 20/20] tcg-arm: Convert to CONFIG_QEMU_LDST_OPTIMIZATION, Richard Henderson <=
- [Qemu-devel] [PATCH v3 19/20] tcg-arm: Use movi32 + blx for calls on v7, Richard Henderson, 2013/03/28

Prev by Date: Re: [Qemu-devel] vNVRAM / blobstore design
Next by Date: Re: [Qemu-devel] [RFC PATCH v2 0/4] port network layer onto glib
Previous by thread: Re: [Qemu-devel] [PATCH v3 20/20] tcg-arm: Convert to CONFIG_QEMU_LDST_OPTIMIZATION
Next by thread: [Qemu-devel] [PATCH v3 19/20] tcg-arm: Use movi32 + blx for calls on v7
Index(es):
- Date
- Thread