[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [avr-libc-dev] Can pgmspace.h __LPM_xxx__ macros become inlinefn's?
From: |
Bill Somerville |
Subject: |
Re: [avr-libc-dev] Can pgmspace.h __LPM_xxx__ macros become inlinefn's? |
Date: |
Tue, 05 Oct 2004 14:57:51 +0100 |
"Theodore A. Roth" wrote:
>
> On Fri, 1 Oct 2004, Bill Somerville wrote:
>
> > > I don't see anywhere that using static is not recommended. Do you have a
> > > reference for that?
> >
> > The penultimate para of the gcc man page "5.34 An Inline Function is As
> > Fast As a Macro" seemed to imply this, but after comments from Geoffrey
> > Wossum and some tests it seems that static __inline__ or extern
> > __inline__ are the only options in header files otherwise multiple
> > definitions occur.
> >
> > Unfortunately this has become academic as I cannot get the inline fn's
> > to generate the same code as the macros, also the inline versions
> > sometimes are bigger. This seems to be an optimiser problem where the
> > register choices made around inlined fn's are not as smart as they might
> > be. I suspect this is a quite obscure gcc bug/feature. The gcc man page
> > says that inlines may generate different code from macros (both larger
> > and smaller).
> >
> > Since I haven't found an example that generates smaller code, I suspect
> > that a community of embedded programmers are not going to be happy with
> > this change!
>
> <snip>
>
> >
> > In the inline version the second LPM result does an unnecessary register
> > shuffle that the macro version avoids. Note that the first LPM is OK so
> > the compiler can get it right sometimes.
>
> Does changing "__asm__" to "__asm__ __volatile__" affect your results?
Yes, but it's not an improvement!
Here's the dumps of the relevant bits (same test code and compiler
switches as in previous mail):
Current macro version:-
=======================
00000056 <main>:
56: cf e5 ldi r28, 0x5F ; 95
58: d2 e0 ldi r29, 0x02 ; 2
5a: de bf out 0x3e, r29 ; 62
5c: cd bf out 0x3d, r28 ; 61
5e: ea e1 ldi r30, 0x1A ; 26
60: f0 e0 ldi r31, 0x00 ; 0
62: c8 95 lpm
64: 40 2d mov r20, r0
66: 31 96 adiw r30, 0x01 ; 1
68: c8 95 lpm
6a: 80 2d mov r24, r0
6c: 28 2f mov r18, r24
6e: 33 27 eor r19, r19
70: 82 2f mov r24, r18
72: 99 27 eor r25, r25
74: 26 95 lsr r18
76: 26 95 lsr r18
78: 82 1b sub r24, r18
7a: 91 09 sbc r25, r1
7c: 84 0f add r24, r20
7e: 91 1d adc r25, r1
80: 00 c0 rjmp .+0 ; 0x82
Inline version with __asm__:-
=============================
00000056 <main>:
56: cf e5 ldi r28, 0x5F ; 95
58: d2 e0 ldi r29, 0x02 ; 2
5a: de bf out 0x3e, r29 ; 62
5c: cd bf out 0x3d, r28 ; 61
5e: ea e1 ldi r30, 0x1A ; 26
60: f0 e0 ldi r31, 0x00 ; 0
62: c8 95 lpm
64: 30 2d mov r19, r0
66: 31 96 adiw r30, 0x01 ; 1
68: c8 95 lpm
6a: 20 2d mov r18, r0
6c: 82 2f mov r24, r18
6e: 99 27 eor r25, r25
70: 26 95 lsr r18
72: 26 95 lsr r18
74: 82 1b sub r24, r18
76: 91 09 sbc r25, r1
78: 83 0f add r24, r19
7a: 91 1d adc r25, r1
7c: 00 c0 rjmp .+0 ; 0x7e
Inline version with __asm__ __volatile__:-
==========================================
00000056 <main>:
56: cf e5 ldi r28, 0x5F ; 95
58: d2 e0 ldi r29, 0x02 ; 2
5a: de bf out 0x3e, r29 ; 62
5c: cd bf out 0x3d, r28 ; 61
5e: ea e1 ldi r30, 0x1A ; 26
60: f0 e0 ldi r31, 0x00 ; 0
62: c8 95 lpm
64: 80 2d mov r24, r0
66: 48 2f mov r20, r24
68: 55 27 eor r21, r21
6a: 31 96 adiw r30, 0x01 ; 1
6c: c8 95 lpm
6e: 80 2d mov r24, r0
70: 99 27 eor r25, r25
72: c8 95 lpm
74: 20 2d mov r18, r0
76: 26 95 lsr r18
78: 26 95 lsr r18
7a: 82 1b sub r24, r18
7c: 91 09 sbc r25, r1
7e: 84 0f add r24, r20
80: 95 1f adc r25, r21
82: 00 c0 rjmp .+0 ; 0x84
Note that the non-volatile version uses poor a register choice for the
second lpm, the volatile version uses a poor register choice for the
first lpm and manages to use an extra instruction over the non-volatile
version as well.
>
> ---
> Ted Roth
> PGP Key ID: 0x18F846E9
> Jabber ID: address@hidden
Bill Somerville