[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Tinycc-devel] [PATCH] arm: Handle __builtin_frame_address(1) correc
From: |
Thomas Preud'homme |
Subject: |
Re: [Tinycc-devel] [PATCH] arm: Handle __builtin_frame_address(1) correctly |
Date: |
Sat, 5 Jan 2013 15:36:49 +0100 |
User-agent: |
KMail/1.13.7 (Linux/3.2.0-4-amd64; KDE/4.8.4; x86_64; ; ) |
Le jeudi 13 décembre 2012 17:55:41, vous avez écrit :
[SNIP]
>
> from arch/arm/kernel/stacktrace.c:
>
> * With framepointer enabled, a simple function prologue looks like
> this: * mov ip, sp
> * stmdb sp!, {fp, ip, lr, pc}
> * sub fp, ip, #4
> *
> * A simple function epilogue looks like this:
> * ldm sp, {fp, sp, pc}
>
> so since fp is initialized from ip-4 (old_sp-4) it points to (near ?)
> fp, not pc, right?
stmdb means sp is decremented first (sp = sp - 4 * nbregs) and then register
are saved in ascending order from this new location. So, pc will end up at
initial_sp - 4 :) Hence ip points at pc.
>
> also linux kernel is compiled with
>
> -mapcs -mno-sched-prolog -mabi=apcs-gnu
>
> and this changes frame layout compared to what gcc produces by default.
>
> (sorry for that near - i don't remember stack and stmdb arm details
> good enough)
stmdb stands for STore Multiple Decrement Before
>
> > > 8: e92d0870 push {r4, r5, r6, fp} ; NOTE
> > > r4... go before fp
> > >
> > > 14: e28db00c add fp, sp, #12
> > >
> > > 5c: e59b0000 ldr r0, [fp] ; BUG here!
> >
> > Why is this a bug in GCC? It will load the old value of fp to r0.
>
> Hmm, after push here is how stack looks like:
>
> r4 <- old_sp
> r5 <- sp + 12
> r6
> fp
> <- sp
>
> (again, off-by-one error probably, but anywayy that sp+12 can't point to
> old fp).
Nope. push store register in ascending order from the lowest address to the
highest address. push {regs} is actually the same as stmdb sp! {regs}. So,
after that initial push, you'd have:
x <- old_sp
fp <- sp + 12
r6
r5
r4 <- new_sp
so in fp is stored sp+12 that is the address where is stored the old fp :)
Hence the ldr does load the address of the previous frame in the return
register (r0).
>
> ~~~~
>
> Anyway, I've prepared more simple complete example to demonstrate the
> bug:
>
> ---- 8< ---- fp.c
> #include <stdio.h>
>
> void *func(int x)
> {
> char *fp1 = __builtin_frame_address(1);
>
> printf("Hello World! %i\n", x);
>
> fp1 += x;
> return fp1;
> }
>
> int main()
> {
> void *fp0 = __builtin_frame_address(0);
> void *fp1 = func(0);
>
> printf("fp0: %p fp1: %p\n", fp0, fp1);
> return 0;
> }
>
> The code should output
>
> Hello World! 0
> fp0: <addr> fp1: <the-same-addr!>
>
> right? on i386:
>
> $ gcc -o fp_i386 -O2 fp.c
> $ ./fp_i386
> Hello World! 0
> fp0: 0xafeda538 fp1: 0xafeda538
>
>
> but it does not on arm, look:
>
> $ arm-linux-gnueabi-gcc-4.4 -marm -o fp_arm -O2 fp.c
> $ qemu-arm -L /usr/arm-linux-gnueabi ./fp_arm
> Hello World! 0
> fp0: 0x4080010c fp1: 0x8414
>
> and the problem here is that gcc, as I think, incorrectly setups fp:
>
> $ arm-linux-gnueabi-gcc-4.4 -marm -o fp.s -S -O2 fp.c
> $ cat fp.s
> .cpu arm9tdmi
> .fpu softvfp
> .eabi_attribute 20, 1
> .eabi_attribute 21, 1
> .eabi_attribute 23, 3
> .eabi_attribute 24, 1
> .eabi_attribute 25, 1
> .eabi_attribute 26, 2
> .eabi_attribute 30, 2
> .eabi_attribute 18, 4
> .file "fp.c"
> .text
> .align 2
> .global func
> .type func, %function
> func:
> @ Function supports interworking.
> @ args = 0, pretend = 0, frame = 0
> @ frame_needed = 1, uses_anonymous_args = 0
> stmfd sp!, {r4, r5, fp, lr}
> add fp, sp, #12
> mov r4, r0
> ldr r5, [fp, #0] ; <- bug here, should be [fp, #-4]
> mov r1, r4
> ldr r0, .L3
> bl printf
> add r0, r5, r4
> sub sp, fp, #12
> ldmfd sp!, {r4, r5, fp, lr}
> bx lr
> .L4:
> .align 2
> .L3:
> .word .LC0
> .size func, .-func
> .align 2
> .global main
> .type main, %function
> main:
> @ Function supports interworking.
> @ args = 0, pretend = 0, frame = 0
> @ frame_needed = 1, uses_anonymous_args = 0
> stmfd sp!, {fp, lr}
> mov r0, #0
> add fp, sp, #4
> bl func
> mov r1, fp
> mov r2, r0
> ldr r0, .L7
> bl printf
> mov r0, #0
> sub sp, fp, #4
> ldmfd sp!, {fp, lr}
> bx lr
> .L8:
> .align 2
> .L7:
> .word .LC1
> .size main, .-main
> .section .rodata.str1.4,"aMS",%progbits,1
> .align 2
> .LC0:
> .ascii "Hello World! %i\012\000"
> .space 3
> .LC1:
> .ascii "fp0: %p fp1: %p\012\000"
> .ident "GCC: (Debian 4.4.6-11) 4.4.6"
> .section .note.GNU-stack,"",%progbits
Agreed, that's weird. I fails equally on my machine (though the code is
different, maybe because I'm using armv7-a as machine arch).
>
>
> and if I apply the following patch
>
> diff --git a/t/fp.s b/t/fp1.s
> index 2047b05..85bb402 100644
> --- a/t/fp.s
> +++ b/t/fp1.s
> @@ -20,7 +20,7 @@ func:
> stmfd sp!, {r4, r5, fp, lr}
> add fp, sp, #12
> mov r4, r0
> - ldr r5, [fp, #0]
> + ldr r5, [fp, #-4]
> mov r1, r4
> ldr r0, .L3
> bl printf
>
> it does work:
>
> $ arm-linux-gnueabi-gcc-4.4 -o fp fp.s
> $ arm-linux-gnueabi-gcc-4.4 -o fp1 fp1.s
> $ qemu-arm -L /usr/arm-linux-gnueabi ./fp
> Hello World! 0
> fp0: 0x4080012c fp1: 0x8414
>
> $ qemu-arm -L /usr/arm-linux-gnueabi ./fp1
> Hello World! 0
> fp0: 0x4080012c fp1: 0x4080012c
>
> that's why I think gcc generated wrong code here.
In this case I agree. It seems that no matter what, __builtin_frame_address
loads fp, no matter where it points.
>
> And to the reference, here is how code would look like if being part of
> linux:
>
> $ arm-linux-gnueabi-gcc-4.4 -o fpapcs.s -marm -mapcs
> -mno-sched-prolog -S -O2 fp.c $ arm-linux-gnueabi-gcc-4.4 -o fpapcs_gnu.s
> -marm -mapcs -mno-sched-prolog -mabi=apcs-gnu -S -O2 fp.c $ diff -u fp.s
> fpapcs.s
> --- fp.s 2012-12-13 20:34:09.000000000 +0400
> +++ fpapcs.s 2012-12-13 20:34:09.000000000 +0400
> @@ -17,16 +17,17 @@
> @ Function supports interworking.
> @ args = 0, pretend = 0, frame = 0
> @ frame_needed = 1, uses_anonymous_args = 0
> - stmfd sp!, {r4, r5, fp, lr}
> - add fp, sp, #12
> + mov ip, sp
> + stmfd sp!, {r4, r5, fp, ip, lr, pc}
> + sub fp, ip, #4
> mov r4, r0
> ldr r5, [fp, #0]
> mov r1, r4
> ldr r0, .L3
> bl printf
> add r0, r5, r4
> - sub sp, fp, #12
> - ldmfd sp!, {r4, r5, fp, lr}
> + sub sp, fp, #20
> + ldmfd sp, {r4, r5, fp, sp, lr}
> bx lr
> .L4:
> .align 2
> @@ -40,17 +41,18 @@
> @ Function supports interworking.
> @ args = 0, pretend = 0, frame = 0
> @ frame_needed = 1, uses_anonymous_args = 0
> - stmfd sp!, {fp, lr}
> + mov ip, sp
> + stmfd sp!, {fp, ip, lr, pc}
> + sub fp, ip, #4
thus fp points to the location where pc is stored
> mov r0, #0
> - add fp, sp, #4
> bl func
> mov r1, fp
> mov r2, r0
> ldr r0, .L7
> bl printf
> mov r0, #0
> - sub sp, fp, #4
> - ldmfd sp!, {fp, lr}
> + sub sp, fp, #12
> + ldmfd sp, {fp, sp, lr}
> bx lr
> .L8:
> .align 2
>
> $ diff -u fpapcs.s fpapcs_gnu.s
> --- fpapcs.s 2012-12-13 20:34:09.000000000 +0400
> +++ fpapcs_gnu.s 2012-12-13 20:32:51.000000000 +0400
> @@ -1,13 +1,3 @@
> - .cpu arm9tdmi
> - .fpu softvfp
> - .eabi_attribute 20, 1
> - .eabi_attribute 21, 1
> - .eabi_attribute 23, 3
> - .eabi_attribute 24, 1
> - .eabi_attribute 25, 1
> - .eabi_attribute 26, 2
> - .eabi_attribute 30, 2
> - .eabi_attribute 18, 4
> .file "fp.c"
> .text
> .align 2
>
>
> but it does not work either:
>
> $ arm-linux-gnueabi-gcc-4.4 -o fpapcs -mapcs -mno-sched-prolog fpapcs.s
> $ qemu-arm -L /usr/arm-linux-gnueabi ./fpapcs
> Hello World! 0
> fp0: 0x4080011c fp1: 0x83e0
Of course since it's the return address which is displayed
>
>
> ~~~~
>
> There are many arm abis and maybe I'm doing something stupid, but to me
> all this looks like gcc generates incorrect code.
Are you sure the code you compiled for ARM doesn't contain
__builtin_return_address instead of __builtin_frame_address?
>
> > > ... as probably required by arm calling convention (not looked at the
> > > spec yet, but it seems reasanoble, given how push is really a block
> > > str and that str stores register in ascending order).
> >
> > I have once seen a page in MSDN describing the ARM stack frame of
> > Windows CE. I don't know if ARM specified how stack frames should look
> > like.
>
> From gcc manual:
>
> `-mapcs-frame'
> Generate a stack frame that is compliant with the ARM Procedure
> Call Standard for all functions, even if this is not strictly
> necessary for correct execution of the code. Specifying
> `-fomit-frame-pointer' with this option will cause the stack
> frames not to be generated for leaf functions. The default is
> `-mno-apcs-frame'.
>
> so at least there is "ARM Procedure Call Standard". Sorry, no time to
> search for it...
>
>
> Thanks,
> Undertimed Kirill
Sorry for taking so much time to answer you.
Best regards,
Thomas
- Re: [Tinycc-devel] [PATCH] arm: Handle __builtin_frame_address(1) correctly,
Thomas Preud'homme <=