avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] Missed optimisations


From: Wouter van Gulik
Subject: Re: [avr-gcc-list] Missed optimisations
Date: Mon, 14 Jan 2008 16:33:58 +0100
User-agent: Thunderbird 2.0.0.9 (Windows/20071031)

David Brown schreef:

The code is basically good (the "swap" instruction is used for the shifts, which is very nice - a big improvement over the older 3.4 gcc), but there are a few missed optimisations shown here that are probably quite common in other code.

Why is the address of crcTable8n loaded into r18:r19 first, before being copied into r30:r31 for the address calculation? It seems that this happens when the address is reused - if it is not reused, then r30:r31 are loaded directly. However, the reuse does not benefit from having the address in a register - the "add r30, r18" and "adc r31, r19" on lines 68 and 69 could be replaced with subi and sbci instructions to save space and time, and to free registers r18:r19. On most RISC cpus, storing the address in a register for reuse would be a benefit, which is probably why this code is generated - on the AVR, it is not helpful (at least, not here).


I don't know. But it happens more often that register are not re-used when the could have been. Maybe because lpm is an a macro. Try replacing it with a normal table index. If that helps, write the "ld r??, Z" in an assembler macro to be sure.

Secondly, the "(data & 0x0f)" clause generates messy 16-bit code. I realise C requires integer promotion in such cases, but it's important to try to remove unnecessary code such as loading the high register with zero, then anding it with zero, then eoring it. gcc version 3.4.6 was sometimes marginally better at such code. It should be noted that the quality of the generated code depends very much on the exact expression - the original "[(crc >> 4) ^ (data & 0x0f)]" generates poor code, while the equivalent "[((crc >> 4) ^ data) & 0x0f]" generates tight code.


Hmm, yes it really gets messy on r31/r23:

  62 0020 F0E0              ldi r31,lo8(0)  ; load with 0
  63 0022 70E0              ldi r23,lo8(0)  ; load with 0
  64 0024 6F70              andi r22,lo8(15);
  65 0026 7070              andi r23,hi8(15); re-load R23 with 0
  66 0028 E627              eor r30,r22     ;
  67 002a F727              eor r31,r23     ; zero XOR zero == 0
  68 002c E20F              add r30,r18     ;
  69 002e F31F              adc r31,r19     ;

This is a known "feature". The patches Andrew Hutchinson is working (?) on are supposed to improve this.

I'am wondering why the load of r31 and r23 is done before the operations. It seems like gcc 4.2.x is moving the loading of the variables a little more away from the use of them, but this does not benefit the AVR.

HTH,

Wouter





reply via email to

[Prev in Thread] Current Thread [Next in Thread]