avr-gcc-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [avr-gcc-list] [bug] cbi optimization error for 8-bit AVRs


From: Georg-Johann Lay
Subject: Re: [avr-gcc-list] [bug] cbi optimization error for 8-bit AVRs
Date: Mon, 17 Nov 2014 18:17:42 +0100
User-agent: Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Thunderbird/24.6.0

Am 11/09/2014 10:00 PM, schrieb Joern Rennecke:
On 8 November 2014 00:32, Szikra István <address@hidden> wrote:
Hi everyone!

My problem in sort: I’m getting
         in      r24, 0x18
         ldi     r25, 0x00
         andi    r24, 0xEF
         out     0x18, r24
instead of
         cbi     0x18, 4
.

I’m trying to write efficient modern C/C++ code for multiple platforms
including AVR 8 bit controllers.

Unfortunately GCC support for AVR (among other things) is not always
flawless. And it changes from versions to version (and not always for the
better).
Since I’m a long time AVR developer I have a lot of compiler versions
installed (WinAVR 2006-2010, and Atmel Studio 6.2 with GCC 4.8.1), but I
could test my code with only one (the latest).

I run into some trouble with clearing port bits not translating from C into
cbi in assembler. It is caused by my bits template, but I do not know why.
It seems to me like a bug in GCC. Maybe someone here can shed some light on
the reason, or suggest a fix.

The transformation would seem prima facia best suited for the combine pass.

For the code in question it's already differently emit by C vs. C++ front end.

Consider


#define RR (*(unsigned char volatile*) (0x18 + __AVR_SFR_OFFSET__))

static unsigned bits (int n)
{
    return 1u << n;
}

void fun (void)
{
    RR &= ~bits(4);
}


as a test case which works both for C and C++. The C++ front fails in applying a similar type demotion like the C front, and no tree pass performs such type demotion, either.

When we see the store, we know it's only 8 bit wide, thus we can perform the
arithmetic in 8 bit.  However, the store is to a volatile (actually,
I/O) address.
Now, gcc is not very good at optimizing code involving volatile (the
target-independent
code would likely already have simplified the arithmetic if these
weren't in the way -
buy you'd really need a completely different test case without
volatile to keep the
computation relevant).
These memory / I/O accesses require extra ordering constraints, so
large parts of the optimizers just punt when they encounter a volatile
reference.
There is some scope to tweak this - I've attached at proof of concept
patch to optimize

Is there a solution that is more reliable than combine? Currently combine appears to be the best candidate. However there are many constraints that must be satisfied for such non-matching split to apply. Optimization must be turned on (it's not only an optimization issue because CBI/SBI are atomic whereas IN/OP/OUT sequences are not), combine must have a scratch handy, the single-set must have been cooked up by combine and from 3 insns, etc. And a matching split is not possible for obvious reasons.

It actually boils down to supplying combine with a split point it fails to find itself -- for whatever reason...

your code (based on gcc 5.0).  However, this opens the possibility
that it'll break something with respect to volatile ordering now - or
even later down the line.

ISTR we have some more detailed dependency checks at least in some places. in
fact, if there weren't, we should already see the existing cbi pattern
misbehaving.
consider:
typedef unsigned char uint8_t;

main ()
{
int i = (*(volatile uint8_t *)((0x18) + 0x20)) ;
(*(volatile uint8_t *)((0x17) + 0x20)) = 0x0f;
(*(volatile uint8_t *)((0x18) + 0x20))  = i & ~4;
}
if the complier was oblivious to the intervening write it could
generate a cbi here -
but it doesn't - well, at least not at -O2 ...

The combiner must not come up with a CBI-like pattern for this; it would mean combine drags a volatile access over an other, and that would be incorrect for the same reasons like for why PR51374 was a combine bug.

Another possible approach would be to use a peephole2 pattern or a
target-specific

peephole2 is even less reliable than combine here :-( because it's rather about data flow than about code flow. A single (move) insn that has nothing to do with the operation and peep2 fails... And in the test case there are also moves involved so that it's almost impossible to recover from that.

optimization pass to do the 16->8 bit arithmetic transformation.
However, if you work after combine (as is definitely the case with peephole2),
you need to


diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index cc85b79..6e7e353 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -679,8 +679,8 @@ (define_expand "mov<mode>"
 ;; "movqi_insn"
 ;; "movqq_insn" "movuqq_insn"
 (define_insn "mov<mode>_insn"
-  [(set (match_operand:ALL1 0 "nonimmediate_operand" "=r    ,d    ,Qm   ,r 
,q,r,*r")
-        (match_operand:ALL1 1 "nox_general_operand"   "r Y00,n Ynn,r 
Y00,Qm,r,q,i"))]
+  [(set (match_operand:ALL1 0 "movdst_operand"     "=r    ,d    ,Qm   ,r 
,q,r,*r")
+        (match_operand:ALL1 1 "nox_general_operand" "r Y00,n Ynn,r 
Y00,Qm,r,q,i"))]
   "(register_operand (operands[0], <MODE>mode)
    || reg_or_0_operand (operands[1], <MODE>mode)) &&
    /* skip if operands are out of lds/sts memory access range(0x40..0xbf)
@@ -3086,6 +3086,20 @@ (define_peephole2 ; andi
     operands[1] = GEN_INT (INTVAL (operands[1]) & INTVAL (operands[2]));
   })

+; standard promotion can get us a 16 bit int where only an 8 bit value
+; is required.  combine-split extend-binop-store to binop-store.
+
+(define_split
+  [(set (match_operand:QI 0 "any_qi_mem_operand")
+       (match_operator 3 "avr_binary_operator"
+        [(match_operand 1 "nonmemory_operand")
+         (match_operand 2 "nonmemory_operand")]))
+   (clobber (match_operand:QI 4 "register_operand"))]

This would work for all 8-bit modes, something like

(define_code_iterator some_binop [xor ior and plus minus mult ashift])

(define_split
  [(set (mem:ALL1 (match_operand:ALL1 0 "any_qi_mem_operand" ""))
        (some_binop:ALL1 (match_operand:ALL1 1 "nonmemory_operand" "")
                         (match_operand:ALL1 2 "nonmemory_operand" "")))
   (clobber (match_operand:ALL1 3 "register_operand"))]
  ""
  [(set (match_dup 3)
        (some_binop:ALL1 (match_dup 1)
                         (match_dup 2)))
   (set (match_dup 0)
        (match_dup 3))]

and then something like 1 == GET_MODE_SIZE (mode) in the predicate.

+  ""
+  [(set (match_dup 4) (match_dup 3))
+   (set (match_dup 0) (match_dup 4))]
+  "");
+
 ;;|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
 ;; ior

diff --git a/gcc/config/avr/predicates.md b/gcc/config/avr/predicates.md
index 4b456a5..ea8e095 100644
--- a/gcc/config/avr/predicates.md
+++ b/gcc/config/avr/predicates.md
@@ -69,6 +69,24 @@ (define_predicate "nop_general_operand"
   (and (match_operand 0 "general_operand")
        (match_test "!avr_mem_flash_p (op)")))

+; allow a QImode memory operand, including volatile / I/O.
+(define_predicate "any_qi_mem_operand"
+  (and (match_code "mem")
+       (match_test "address_operand (XEXP (op, 0), Pmode)")
+       (match_test "mode == QImode")))
+
+;; We also allow volatile memory here, so that the combiner can work with
+;; stores for I/O accesses as combiner-splitter results.
+;; As this is for a destination location, that should generally be safe
+;; for combine, as we should he handling the last insn then, which stays
+;; in place.
+;; ??? Could this ever match for a two-destination combination?
+;; ??? Should we make sure this predicate can't match I/O for transformations
+;; with a wider scope, like store sinking?

Unfortunately there is no easy going way to query GCC for the current pass, e.g. something like combine_in_progress or combine_completed...

May it be the case that some pass changes the access width or split one HI access to 2*QI, e.g. subreg lowering? Using a different predicate (or otherwise hacking around volatile_ok) is inevitable to make this solution work with combine, hence it would be highly appreciated to forge the new predicate in such a way that it is the same like the original one except for insn combiner.

+(define_predicate "movdst_operand"
+  (ior (match_operand 0 "nonimmediate_operand")
+       (match_operand 0 "any_qi_mem_operand")))
+
 ;; Return 1 if OP is an "ordinary" general operand, i.e. a general
 ;; operand whose load is not handled by a libgcc call or ELPM.
 (define_predicate "nox_general_operand"
@@ -180,6 +198,10 @@ (define_predicate "simple_comparison_operator"
   (and (match_operand 0 "comparison_operator")
        (not (match_code "gt,gtu,le,leu"))))

+; A binary operator that the avr can perform in a single insn.
+(define_predicate "avr_binary_operator"
+  (match_code "and,ior,xor,plus,minus"))
+
 ;; Return true if OP is a valid call operand.
 (define_predicate "call_insn_operand"
   (and (match_code "mem")




reply via email to

[Prev in Thread] Current Thread [Next in Thread]