Re: [Tinycc-devel] Regression on ARM

From: Kirill Smelkov
Subject: Re: [Tinycc-devel] Regression on ARM
Date: Mon, 26 Nov 2012 00:05:19 +0400
On Sat, Nov 24, 2012 at 02:43:34PM +0100, Thomas Preud'homme wrote:
> Le samedi 24 novembre 2012 10:02:54, Kirill Smelkov a écrit :
> > 
> > Thanks for the info. The progress on my side is as follows: I've learned
> > arm assembly and setup arm and armhf cross-toolchains (turned out to be
> > very easy, thanks to emdebian[1]). Also I can run arm binaries via
> > qemu-arm and basic hello world works.
> I'im ashamed byt I never tried using emdebian yet. Is there some ready 
> packages in emdebian for installing a cross-toolchains easily?

Exactly. The link I gave you has instructions. Also, go to search tab at
emdebian.org and search e.g. for arm, mips, etc - there are packages for
cross gcc, g++, gdb, gfortran, glibc, etc. Only uclibc is not there.

I've installed toolchains for armel and armhf from unstable.

> > The next step is to analyze the __builtin_frame_address problem. I'll
> > keep you posted.
> Great thanks.

It looks like I know what is happening. e.g. for

    void f(int x, int y)

tcc first saves r0 & r1, and only then fp:

    $ ./arm-eabi-tcc -c y.c
    $ arm-linux-gnueabi-objdump -d y.o

    00000000 <f>:
       0:   e1a0c00d        mov     ip, sp
       4:   e92d0003        push    {r0, r1}
       8:   e92d5800        push    {fp, ip, lr}
       c:   e28db00c        add     fp, sp, #12
      10:   e1a00000        nop                     ; (mov r0, r0)
      14:   e91ba800        ldmdb   fp, {fp, sp, pc}

gcc does not, but it will save r5,r6,etc. before fp(=r13 iirc) ...

    $ arm-linux-gnueabihf-gcc-4.7 -marm -c y.c 
    $ arm-linux-gnueabihf-objdump  -d y.o
    00000000 <f>:
       0:   e52db004        push    {fp}            ; (str fp, [sp, #-4]!)
       4:   e28db000        add     fp, sp, #0
       8:   e24dd00c        sub     sp, sp, #12
       c:   e50b0008        str     r0, [fp, #-8]
      10:   e50b100c        str     r1, [fp, #-12]
      14:   e28bd000        add     sp, fp, #0
      18:   e8bd0800        pop     {fp}
      1c:   e12fff1e        bx      lr

    $ cat y.c
    char *f(int x, int y)
        int i, j, k;
        int r=0;
        /* lots of stuff to force gcc use r4,r5,... */
        for (i=0; i<10; ++i)
            for (j=0; j<15; ++j)
                for (k=0; k< 20; ++k)
                    r += x+y + i+j+k;
        /* + r is just not to kill the above as dead code */
        return (char *)__builtin_frame_address(1) + r;

    $ arm-linux-gnueabihf-gcc-4.7 -marm -O2 -c y.c
    $ arm-linux-gnueabihf-objdump  -d y.o
    00000000 <f>:
       0:   e0801001        add     r1, r0, r1
       4:   e2813001        add     r3, r1, #1
       8:   e92d0870        push    {r4, r5, r6, fp}        ; NOTE r4... go 
before fp
       c:   e0832183        add     r2, r3, r3, lsl #3
      10:   e3a05000        mov     r5, #0
      14:   e28db00c        add     fp, sp, #12
      18:   e0833082        add     r3, r3, r2, lsl #1
      1c:   e1a04005        mov     r4, r5
      20:   e28360ab        add     r6, r3, #171    ; 0xab
      24:   e085c001        add     ip, r5, r1
      28:   e1a02006        mov     r2, r6
      2c:   e3a03000        mov     r3, #0
      30:   e2833001        add     r3, r3, #1
      34:   e084400c        add     r4, r4, ip
      38:   e353000f        cmp     r3, #15
      3c:   e0844002        add     r4, r4, r2
      40:   e28cc001        add     ip, ip, #1
      44:   e2822013        add     r2, r2, #19
      48:   1afffff8        bne     30 <f+0x30>
      4c:   e2855001        add     r5, r5, #1
      50:   e2866013        add     r6, r6, #19
      54:   e355000a        cmp     r5, #10
      58:   1afffff1        bne     24 <f+0x24>
      5c:   e59b0000        ldr     r0, [fp]                ; BUG here!
      60:   e0800004        add     r0, r0, r4
      64:   e24bd00c        sub     sp, fp, #12
      68:   e8bd0870        pop     {r4, r5, r6, fp}

... as maybe required by arm calling convention (not looked at the spec
yet, but it seems reasanoble, given how push is really a block str and
that str stores register in ascending order).

So for any function we can't know where it's caller fp is stored on the
frame, *but* for currently-compiled function we can know it - we pushed
that many regs on the stack first and we know the value, so on arm
__builtin_frame_address(level=1) should work but for level > is
undefined. Note - gcc is seemingle wrong here too

That's are my current findings. Will try to prepare patch for tcc and
maybe gcc. I'm also not 100% sure because I had only a very small bit of
time for this.


