Re: [avr-gcc-list] [bug] cbi optimization error for 8-bit AVRs

avr-gcc-list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] [bug] cbi optimization error for 8-bit AVRs

From:	Georg-Johann Lay
Subject:	Re: [avr-gcc-list] [bug] cbi optimization error for 8-bit AVRs
Date:	Mon, 17 Nov 2014 18:17:42 +0100
User-agent:	Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

Am 11/09/2014 10:00 PM, schrieb Joern Rennecke:

On 8 November 2014 00:32, Szikra István <address@hidden> wrote:

Hi everyone!

My problem in sort: I’m getting
         in      r24, 0x18
         ldi     r25, 0x00
         andi    r24, 0xEF
         out     0x18, r24
instead of
         cbi     0x18, 4
.

I’m trying to write efficient modern C/C++ code for multiple platforms
including AVR 8 bit controllers.

Unfortunately GCC support for AVR (among other things) is not always
flawless. And it changes from versions to version (and not always for the
better).
Since I’m a long time AVR developer I have a lot of compiler versions
installed (WinAVR 2006-2010, and Atmel Studio 6.2 with GCC 4.8.1), but I
could test my code with only one (the latest).

I run into some trouble with clearing port bits not translating from C into
cbi in assembler. It is caused by my bits template, but I do not know why.
It seems to me like a bug in GCC. Maybe someone here can shed some light on
the reason, or suggest a fix.


The transformation would seem prima facia best suited for the combine pass.


For the code in question it's already differently emit by C vs. C++ front end.

Consider


#define RR (*(unsigned char volatile*) (0x18 + __AVR_SFR_OFFSET__))

static unsigned bits (int n)
{
    return 1u << n;
}

void fun (void)
{
    RR &= ~bits(4);
}

as a test case which works both for C and C++. The C++ front fails in applyinga similar type demotion like the C front, and no tree pass performs such typedemotion, either.

When we see the store, we know it's only 8 bit wide, thus we can perform the
arithmetic in 8 bit.  However, the store is to a volatile (actually,
I/O) address.
Now, gcc is not very good at optimizing code involving volatile (the
target-independent
code would likely already have simplified the arithmetic if these
weren't in the way -
buy you'd really need a completely different test case without
volatile to keep the
computation relevant).
These memory / I/O accesses require extra ordering constraints, so
large parts of the optimizers just punt when they encounter a volatile
reference.
There is some scope to tweak this - I've attached at proof of concept
patch to optimize

Is there a solution that is more reliable than combine? Currently combineappears to be the best candidate. However there are many constraints that mustbe satisfied for such non-matching split to apply. Optimization must be turnedon (it's not only an optimization issue because CBI/SBI are atomic whereasIN/OP/OUT sequences are not), combine must have a scratch handy, the single-setmust have been cooked up by combine and from 3 insns, etc. And a matchingsplit is not possible for obvious reasons.

It actually boils down to supplying combine with a split point it fails to finditself -- for whatever reason...

your code (based on gcc 5.0).  However, this opens the possibility
that it'll break something with respect to volatile ordering now - or
even later down the line.

ISTR we have some more detailed dependency checks at least in some places. in
fact, if there weren't, we should already see the existing cbi pattern
misbehaving.
consider:
typedef unsigned char uint8_t;

main ()
{
int i = (*(volatile uint8_t *)((0x18) + 0x20)) ;
(*(volatile uint8_t *)((0x17) + 0x20)) = 0x0f;
(*(volatile uint8_t *)((0x18) + 0x20))  = i & ~4;
}
if the complier was oblivious to the intervening write it could
generate a cbi here -
but it doesn't - well, at least not at -O2 ...

The combiner must not come up with a CBI-like pattern for this; it would meancombine drags a volatile access over an other, and that would be incorrect forthe same reasons like for why PR51374 was a combine bug.

Another possible approach would be to use a peephole2 pattern or a
target-specific

peephole2 is even less reliable than combine here :-( because it's rather aboutdata flow than about code flow. A single (move) insn that has nothing to dowith the operation and peep2 fails... And in the test case there are alsomoves involved so that it's almost impossible to recover from that.

optimization pass to do the 16->8 bit arithmetic transformation.
However, if you work after combine (as is definitely the case with peephole2),
you need to


diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index cc85b79..6e7e353 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -679,8 +679,8 @@ (define_expand "mov<mode>"
 ;; "movqi_insn"
 ;; "movqq_insn" "movuqq_insn"
 (define_insn "mov<mode>_insn"
-  [(set (match_operand:ALL1 0 "nonimmediate_operand" "=r    ,d    ,Qm   ,r 
,q,r,*r")
-        (match_operand:ALL1 1 "nox_general_operand"   "r Y00,n Ynn,r 
Y00,Qm,r,q,i"))]
+  [(set (match_operand:ALL1 0 "movdst_operand"     "=r    ,d    ,Qm   ,r 
,q,r,*r")
+        (match_operand:ALL1 1 "nox_general_operand" "r Y00,n Ynn,r 
Y00,Qm,r,q,i"))]
   "(register_operand (operands[0], <MODE>mode)
    || reg_or_0_operand (operands[1], <MODE>mode)) &&
    /* skip if operands are out of lds/sts memory access range(0x40..0xbf)
@@ -3086,6 +3086,20 @@ (define_peephole2 ; andi
     operands[1] = GEN_INT (INTVAL (operands[1]) & INTVAL (operands[2]));
   })

+; standard promotion can get us a 16 bit int where only an 8 bit value
+; is required.  combine-split extend-binop-store to binop-store.
+
+(define_split
+  [(set (match_operand:QI 0 "any_qi_mem_operand")
+       (match_operator 3 "avr_binary_operator"
+        [(match_operand 1 "nonmemory_operand")
+         (match_operand 2 "nonmemory_operand")]))
+   (clobber (match_operand:QI 4 "register_operand"))]


This would work for all 8-bit modes, something like

(define_code_iterator some_binop [xor ior and plus minus mult ashift])

(define_split
  [(set (mem:ALL1 (match_operand:ALL1 0 "any_qi_mem_operand" ""))
        (some_binop:ALL1 (match_operand:ALL1 1 "nonmemory_operand" "")
                         (match_operand:ALL1 2 "nonmemory_operand" "")))
   (clobber (match_operand:ALL1 3 "register_operand"))]
  ""
  [(set (match_dup 3)
        (some_binop:ALL1 (match_dup 1)
                         (match_dup 2)))
   (set (match_dup 0)
        (match_dup 3))]

and then something like 1 == GET_MODE_SIZE (mode) in the predicate.

+  ""
+  [(set (match_dup 4) (match_dup 3))
+   (set (match_dup 0) (match_dup 4))]
+  "");
+
 ;;|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
 ;; ior

diff --git a/gcc/config/avr/predicates.md b/gcc/config/avr/predicates.md
index 4b456a5..ea8e095 100644
--- a/gcc/config/avr/predicates.md
+++ b/gcc/config/avr/predicates.md
@@ -69,6 +69,24 @@ (define_predicate "nop_general_operand"
   (and (match_operand 0 "general_operand")
        (match_test "!avr_mem_flash_p (op)")))

+; allow a QImode memory operand, including volatile / I/O.
+(define_predicate "any_qi_mem_operand"
+  (and (match_code "mem")
+       (match_test "address_operand (XEXP (op, 0), Pmode)")
+       (match_test "mode == QImode")))
+
+;; We also allow volatile memory here, so that the combiner can work with
+;; stores for I/O accesses as combiner-splitter results.
+;; As this is for a destination location, that should generally be safe
+;; for combine, as we should he handling the last insn then, which stays
+;; in place.
+;; ??? Could this ever match for a two-destination combination?
+;; ??? Should we make sure this predicate can't match I/O for transformations
+;; with a wider scope, like store sinking?

Unfortunately there is no easy going way to query GCC for the current pass,e.g. something like combine_in_progress or combine_completed...

May it be the case that some pass changes the access width or split one HIaccess to 2*QI, e.g. subreg lowering? Using a different predicate (orotherwise hacking around volatile_ok) is inevitable to make this solution workwith combine, hence it would be highly appreciated to forge the new predicatein such a way that it is the same like the original one except for insn combiner.

+(define_predicate "movdst_operand"
+  (ior (match_operand 0 "nonimmediate_operand")
+       (match_operand 0 "any_qi_mem_operand")))
+
 ;; Return 1 if OP is an "ordinary" general operand, i.e. a general
 ;; operand whose load is not handled by a libgcc call or ELPM.
 (define_predicate "nox_general_operand"
@@ -180,6 +198,10 @@ (define_predicate "simple_comparison_operator"
   (and (match_operand 0 "comparison_operator")
        (not (match_code "gt,gtu,le,leu"))))

+; A binary operator that the avr can perform in a single insn.
+(define_predicate "avr_binary_operator"
+  (match_code "and,ior,xor,plus,minus"))
+
 ;; Return true if OP is a valid call operand.
 (define_predicate "call_insn_operand"
   (and (match_code "mem")

[Prev in Thread]

Current Thread

[Next in Thread]

[avr-gcc-list] [bug] cbi optimization error for 8-bit AVRs, Szikra István, 2014/11/07
- Re: [avr-gcc-list] [bug] cbi optimization error for 8-bit AVRs, David Brown, 2014/11/08
  - Re: [avr-gcc-list] [bug] cbi optimization error for 8-bit AVRs, Szikra István, 2014/11/08
    - Re: [avr-gcc-list] [bug] cbi optimization error for 8-bit AVRs, David Brown, 2014/11/10
- Re: [avr-gcc-list] [bug] cbi optimization error for 8-bit AVRs, Joern Rennecke, 2014/11/09
  - Re: [avr-gcc-list] [bug] cbi optimization error for 8-bit AVRs, Georg-Johann Lay, 2014/11/17
- Re: [avr-gcc-list] [bug] cbi optimization error for 8-bit AVRs, Joern Rennecke, 2014/11/09
  - Re: [avr-gcc-list] [bug] cbi optimization error for 8-bit AVRs, Georg-Johann Lay <=
    - Re: [avr-gcc-list] [bug] cbi optimization error for 8-bit AVRs, Joern Rennecke, 2014/11/17

Prev by Date: Re: [avr-gcc-list] [bug] cbi optimization error for 8-bit AVRs
Next by Date: Re: [avr-gcc-list] [bug] cbi optimization error for 8-bit AVRs
Previous by thread: Re: [avr-gcc-list] [bug] cbi optimization error for 8-bit AVRs
Next by thread: Re: [avr-gcc-list] [bug] cbi optimization error for 8-bit AVRs
Index(es):
- Date
- Thread