[avr-gcc-list] avr-gcc 4.7 fixed-point support [was: Re: fixed point for

avr-gcc-list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[avr-gcc-list] avr-gcc 4.7 fixed-point support [was: Re: fixed point for

From:	Georg-Johann Lay
Subject:	[avr-gcc-list] avr-gcc 4.7 fixed-point support [was: Re: fixed point for avr gcc]
Date:	Sun, 15 Jul 2012 16:30:39 +0200
User-agent:	Thunderbird 2.0.0.24 (Windows/20100228)

Some time ago I had a private conversation with Sean on porting
his avr-gcc fixed-point support to mainline avr-gcc 4.7.

During the conversation there were several draft versions of
the work against 4.7, but suddenly the conversation stopped
and the fixed-point support ended up nowhere...

Hope that nothing serious happened.

Because I think it's a pity if all that work is lost,
I decided to post the latest version of the patch from
2012-12-13 together with the according email.

Johann


On Wed, Dec 07, 2011 at 06:08:19PM +0100, Georg-Johann Lay wrote:


There are some missing, I attached a patch atop of your patch that


I applied it to my changes, but I could not accept a few minor changes
to avr-fixed.md

@@ -43,25 +42,29 @@
  (define_insn "fract<FIXED2:mode><FIXED1:mode>2"
    [(set (match_operand:FIXED1 0 "register_operand" "=r")
          (fract_convert:FIXED1 (match_operand:FIXED2 1
"register_operand" "r")))]
-  ""
+  "<FIXED1:MODE>mode != <FIXED2:MODE>mode
+   && (!SCALAR_INT_MODE_P (<FIXED1:MODE>mode)
+       || !SCALAR_INT_MODE_P (<FIXED2:MODE>mode))"
    {

What is the purpose of this logic?  I think it is redundant within gcc
already.

@@ -267,8 +270,8 @@
  ;; "*mulsa3_call" "*mulusa3_call"
  (define_insn "*mul<mode>3_call"
    [(set (reg:ALLSA 14) (mult:ALLSA (reg:ALLSA 18) (reg:ALLSA 24)))
-   (clobber (reg:ALLSA 18))
-   (clobber (reg:ALLSA 24))
+   (clobber (reg:SI 18))
+   (clobber (reg:SI 24))
     (clobber (reg:HI 22))]
    "!AVR_HAVE_MUL"
    "%~call __mul<mode>3"

This change caused a new internal compiler error with my test code,
so I left it as ALLSA.

Have a look at the new %T as documented at top of avr.md.

I tried to rewrite avr_fract_out and got lost in the function.

That function need definitely cleanup, to be disentangled and

better documented!

Take a look at the new latest patch. I made it from newest svn gccwith hopefully all the new updates. I got rid of all the sprintf anddid some cleanup in fract_out.

It is a potential source of errors and hard to understand and maintain in its
current style.  For example, with blen[i] = GET_MODE_BITSIZE (GET_MODE
(operands[i])):

  xop[2] = GEN_INT (blen[0] - 1);
  xop[3] = GEN_INT (blen[1] - 1);

  /* Store the sign bit if the destination is a signed
     fract and the source has a sign in the integer part.  */

  if (sbit[0] && !ilen[0] && sbit[1] && ilen[1])
    {
      /* To avoid using BST and BLD if the source and
         destination registers overlap we can use a single LSL
         since we don't care about preserving the source register.  */

      if (rdest < rsource + tlen[1]
          && rdest + tlen[0] > rsource)
        {
          avr_asm_len ("lsl %T1%t3", xop, plen, 1);
          sign_in_Carry = true;
        }
      else
        {
          avr_asm_len ("bst %T1%T3", xop, plen, 1);
          sign_in_T = true;
        }
    }

But I got lost in the remainder of the routine and gave up,
so it's in a mess with my patch...

Document what the routine does and which part handles what bytes,
what is already finished and what is still to be done.

Moreover, it appears you do sprintf to buf[] but never actually output
it in some places?

It's just remark on my personal preference.

One more thing: Don't use magic comments like

;; -*- Mode: Scheme -*-


I forgot about that!

in avr-fixed.md!

If you like emacs, you can bind extension(s) to major-mode like so (provided
you have support for md-mode):

(setq auto-mode-alist
      (cons '("\\.md$" . md-mode)
            auto-mode-alist))

with the additional benefit that *any* md-file you open will be in the right
mode. More:



Ok, perfect I think I get the same effect as my magic comment.

(setq auto-mode-alist
      (cons '("[a-z\-]+\\.def$" . c-mode)
            auto-mode-alist))

(setq auto-mode-alist
      (cons '("Makefile\\.def$" . autoconf-mode)
            auto-mode-alist))

(setq auto-mode-alist
      (cons '("config.gcc$" . sh-mode)
            auto-mode-alist))

(setq auto-mode-alist
      (cons '("\\.opt$" . md-mode)
            auto-mode-alist))

(setq auto-mode-alist
      (cons '(".*\\bt\\-[a-zA-Z0-9\-]+\\'" . makefile-mode)
            auto-mode-alist))

(setq auto-mode-alist
      (cons '("ChangeLog.*$" . change-log-mode)
            auto-mode-alist))

One more non-gcc related hint:

You can set diff-cmd in your .subversion/config:

### This file configures various client-side behaviors.
...
### Section for configuring external helper applications.
[helpers]
...
diff-cmd=.../svndiff

and the svndiff:

#!/bin/sh
diff=/usr/bin/diff
extra_args="-F^(define"

# Do not pass multiple -c/-u options
case "$1" in
    -c*|-C*) exec ${diff} -p ${extra_args} "$@" ;;
    -u*|-U*) exec ${diff} -p ${extra_args} "$@" ;;
    *) exec ${diff} -up ${extra_args} "$@" ;;
esac

That way you get context of changes in diff for insns, expanders, peepholes,
etc. (any line starting with (define starts a new context :-))


Wow, I have to try it!


Suppose addsa3 insn. It has 2 constraint alternatives: "r,0,r" and "d,0,i".

If $2 is CONST, it comes to reload and there is no more d-register, reload has
to satisfy the constraints in some way. As there is only one alternative that
supports "r" in $0, it picks that constraint and reloads $2 to some register
(which is always possible). That way, you need an SAmode register to load the
constant which is 4 bytes. This increases register pressure which might lead to
more spills/stack space and is suboptimal with respect to the generated
instructions. Suppose you add 0xffff0001, then in the better case where the
reload reg is r-reg:

LDI r20, 0x01 ; movsa
LDI r21, 0x00
LDI r22, 0xff
LDI r23, 0xff

ADD r10, r20
ADC r11, r21
ADC r12, r22
ADC r13, r23

and in the case where the reload reg is a l-reg:

LDI r30, 0x01 ; movsa
mov r6, r30
mov r7, __zero_reg__
LDI r30, 0xff
mov r8, r30
mov r9, r30

ADD r10, r6
ADC r11, r7
ADC r12, r8
ADC r13, r9

With a QI clobber and using avr_out_plus_1, however, you get:

LDI r30, 0xff
sub r10, r30
sbc r11, r30
sbc r12, __zero_reg
sbc r13, __zero_reg

Notice that you can use avr_out_plus_1 to find out if adding is better or
subtracting the negated constant.

I'd suppose you don't include this in your initial patch.
Better do tweaks in a follow-up patch


I agree.  There are many other optimizations besides this one.

Wouldn't it be nice if we could just describe every possible opcode
to gcc and what it did, then have it sit there and figure out the
best way to do everything?  Then it wouldn't be any extra work to
support new types.


The problem is too hard, many NP-complete problems all over the place.
Therefore, GCC uses several passes to narrow down a good solution step-by-step.
Notice that in GCC no algorithms with quadratic or even worse complexitx are
accepted.


Yes, because it would be working backwards.  I wrote a program which
compounded assembly instructions and could convert sequences to shorter
sequences.
I got to about 5 instructions deep on the avr before the complexity was
too great.

It isn't really an easy problem, but I believe it is possible.  Maybe a
program which could generate rtl from an instruction set.  It might take
a while to accomplish this, but it's ok because it only needs to run
once in a while.

Are add/sub insns really needed because they are not mapped to
qi/hi/si if no insn is available?

* If you are more convenient with non-ABI interface then just do it.
I even dared to change the ABI for 32 bit multiplication to pass
values in R26.  If you are using FMUL* it might be better to use
non-ABI interface to avoid moves.


I only use FMUL on the fractional types with sign bits.

== Potential Problems/Bugs ==

* You must provide ChangeLog


I will add that once the patch is finalized.
Is it part of the patch or a separate file?


Don't include them in the patch, just write it as text and include it in the
mail you send for review, see

http://gcc.gnu.org/ml/gcc-patches/2011-11/msg01820.html

At the end of the mail (prior to the attachment) there is the ChangeLog as it
will be used as commit message. This text is inserted into respective ChangeLog
files after approval and committed as one change-set, see the respective commit
(message) and changes to respective ChangeLogs:

http://gcc.gnu.org/viewcvs?view=revision&revision=181482

* You intend to supply test cases for test suite, too?
Or is the overall fixed regression tests in the suite enough?


I did not intend to supply any test cases, but it would be nice.
I don't think the fixed regression tests are sufficient,
but I need to take another look.


There are still more problems: If there is no move insn, there must not be
insns that might require respective reloads. Example: If you write an adddi3
insn to add 8-byte integers (an insn which would be more or less straight
forward), you'd also need to supply movdi (writing it would by no means be
straight forward).

This is the reason why here is no native DImode support in avr-gcc: movdi is
just too painful to write.

It's similar with DAmode or TAmode. You must not write insns that might request
respective reloads: A DA pseude might need 8-byte reload which is nowhere in
avr.md. If reload needs to reload such a value, it need respective insn to
accomplish the reload.

So [U]DA, [U]TA in FIXED1/2 mode iterator are not appropriate.


Actually these modes should still work.  I'm not too sure about the
correctness of 128bit TAmode, because I tried to use it and it loaded
the constant wrong.

DAmode seems fine.  It has compiled c code to support it so it would be
rather slow, but it should still work.  The saturating types are also
probably supported with c routines.

Maybe supporting saturating types isn't that hard.

* How is insn length adjusted for "fract_out"? See how it works for, e.g.
$   grep -i addto_sp avr.*
and observe how addto_sp from adjust_len attribute is used in
avr.c:adjust_insn_length().


I did an outline of this in the patch-atop-patch.


Sean

With this func, 4 extra mov instructions
are emmited that need not be generated.

accum func(short accum y)
{
     return y;
}


func:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
 ; (insn 6 3 31 (set (reg:SA 24 r24 [45])
 ;         (fract_convert:SA (reg:HA 24 r24 [ y ]))) test.c:7 208 {fracthasa2}
 ;      (nil))
        clr r27  ;  6   fracthasa2      [length = 6]
        mov r26, r25
        mov r25, r24
        clr r24
        sbrc r26,7
        com r27
 ; (insn 31 6 32 (set (reg:QI 22 r22 [55])
 ;         (reg:QI 24 r24 [orig:51 <retval> ] [51])) test.c:8 899 {movqi_insn}
 ;      (expr_list:REG_DEAD (reg:QI 21 r21 [orig:51 <retval> ] [51])
 ;         (nil)))
        mov r22,r24      ;  31  movqi_insn/1    [length = 1]
 ; (insn 32 31 33 (set (reg:QI 23 r23 [+1 ])
 ;         (reg:QI 25 r25 [orig:52 <retval>+1 ] [52])) test.c:8 899 {movqi_insn}
 ;      (expr_list:REG_DEAD (reg:QI 20 r20 [orig:52 <retval>+1 ] [52])
 ;         (nil)))
        mov r23,r25      ;  32  movqi_insn/1    [length = 1]
 ; (insn 33 32 34 (set (reg:QI 24 r24 [+2 ])
 ;         (reg:QI 26 r26 [orig:53 <retval>+2 ] [53])) test.c:8 899 {movqi_insn}
 ;      (expr_list:REG_DEAD (reg:QI 19 r19 [orig:53 <retval>+2 ] [53])
 ;         (nil)))
        mov r24,r26      ;  33  movqi_insn/1    [length = 1]
 ; (insn 34 33 14 (set (reg:QI 25 r25 [+3 ])
 ;         (reg:QI 27 r27 [orig:54 <retval>+3 ] [54])) test.c:8 899 {movqi_insn}
 ;      (expr_list:REG_DEAD (reg:QI 18 r18 [orig:54 <retval>+3 ] [54])
 ;         (nil)))
        mov r25,r27      ;  34  movqi_insn/1    [length = 1]
 ; (jump_insn 39 14 38 (return) test.c:8 1231 {return}
 ;      (nil)
 ;  -> return)
        ret      ;  39  return  [length = 1]


Maybe because gcc
has some alignment constraints and can't just ask
fract_out for the accum in the return registers
instead of some other register group which it then
copies to the return registers.

other way around, no problem.

short accum func(accum y)
{
     return y;
}


func:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
 ; (insn 6 3 14 (set (reg:HA 24 r24 [45])
 ;         (fract_convert:HA (reg/v:SA 22 r22 [orig:44 y ] [44]))) test.c:7 170 
{fractsaha2}
 ;      (expr_list:REG_DEAD (reg:QI 23 r23)
 ;         (expr_list:REG_DEAD (reg:QI 22 r22)
 ;             (nil))))
        mov r25, r24     ;  6   fractsaha2      [length = 2]
        mov r24, r23
 ; (jump_insn 31 14 30 (return) test.c:8 1231 {return}
 ;      (nil)
 ;  -> return)
        ret      ;  31  return  [length = 1]

avr-gcc-fixedpoint-13-12-2011.diff.bz2
Description: Binary data

[Prev in Thread]

Current Thread

[Next in Thread]

[avr-gcc-list] avr-gcc 4.7 fixed-point support [was: Re: fixed point for avr gcc], Georg-Johann Lay <=

Prev by Date: Re: [avr-gcc-list] Speeding [u]ltoa by 35%
Next by Date: Re: [avr-gcc-list] Speeding [u]ltoa by 35%
Previous by thread: [avr-gcc-list] (no subject)
Next by thread: [avr-gcc-list] Pointer register allocation optimizer
Index(es):
- Date
- Thread