[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [avr-gcc-list] '-morder' option with Avr-libc: comparison table
From: |
Dmitry K. |
Subject: |
Re: [avr-gcc-list] '-morder' option with Avr-libc: comparison table |
Date: |
Mon, 14 Jan 2008 17:16:22 +1000 |
User-agent: |
KMail/1.5 |
On Monday 14 January 2008 10:55, Andy wrote:
> Great work!
>
> morder1 and my order were very close. I think the difference only
> becomes apparent when operands are 4 bytes or longer. morder2 is very
> bad as early assignment of a byte in an odd register will bump
> assignment of wider operands. So you get a few extra moves. (which are
> more of a problem on At90s8515, than mega)
Hi.
There are a few functions, where yours order is considerable
better than '-morder1'. (Although in summary order1 is
slightly better.) Below I include the full reports for both.
> So I would expect floating point only to show difference between my
> order and morder1. Also note, mcall prolog will hide stack usage effect
> on text (push/pop) size- since any number of push/pop >0 has same size.
> (I tend to leave off mcall prolog as saftey check on stack impact)
>
> How did you determine stack usage ?
This is a real stack usage regardless of with or without
'-mcall-prologues' option -- a simulation is used to
determine. The algorithm is:
- fill the stack with some value
- run the target function
- find the first corrupted byte in stack (depth)
- repeat above with another value (to avoid possible
collision)
- link the program with dummy function ('reti')
- repeat above
- subtract this depthes and add 2 (stack usage of
dummy function)
The result stack usage:
- includes all internally called functions
- does not include stack for arguments (printf, scanf).
> -frename-registers is not well described - no idea what it does! I tend
> to use -Os as benchmark - which excludes this.
Precise, all test cases include '-Os' in option list,
the '-frename-registers' was in addition.
Regards,
Dmitry.
The full report of case: '-morder1' is added:
AVR: at90s8515__________________________ atmega8____________________________
GCC: 3.3.6 3.4.6 4.0.4 4.1.2 4.2.2 4.3.X 3.3.6 3.4.6 4.0.4 4.1.2 4.2.2 4.3.X
-------------------------------------------------------------------------------
bsearch("z",s,sizeof(s),1,cmp)
Flash: 276 270 266 266 266 268 214 212 208 208 208 204
Stack: 16 16 16 16 16 16 16 16 16 16 16 16
Time: 534 533 530 530 530 530 327 334 331 331 331 327
-------------------------------------------------------------------------------
dtostre(1.2345,s,6,0)
Flash: 1000 996 1102 1100 1090 1172 932 928 1018 1004 998 1078
Stack: 15 15 15 19 19 17 15 15 15 19 19 17
Time: 1197 1196 1284 1296 1296 1290 1058 1057 1118 1136 1136 1140
-------------------------------------------------------------------------------
dtostrf(1.2345,15,6,s)
Flash: 1666 1688 1648 1666 1632 1692 1514 1530 1522 1544 1506 1570
Stack: 35 38 39 38 37 39 35 38 39 38 37 39
Time: 1668 1619 1653 1604 1595 1616 1479 1436 1480 1442 1432 1455
-------------------------------------------------------------------------------
free(p)
Flash: 550 558 552 552 562 572 498 506 510 510 516 516
Stack: 4 4 4 4 4 4 4 4 4 4 4 4
Time: 220 227 227 227 229 228 200 207 210 210 211 209
-------------------------------------------------------------------------------
malloc(1)
Flash: 550 558 552 552 562 572 498 506 510 510 516 516
Stack: 2 4 4 4 4 4 2 4 4 4 4 4
Time: 184 191 193 193 195 194 166 173 177 177 178 176
-------------------------------------------------------------------------------
qsort(s,sizeof(s),1,cmp)
Flash: 1332 1306 1220 1222 1242 1488 1074 1070 994 996 1008 1262
Stack: 36 36 36 36 38 40 36 36 36 36 38 40
Time: 22017 21944 20182 20474 20914 20949 16965 16955 16002 16294 16678 16854
-------------------------------------------------------------------------------
rand()
Flash: 528 528 498 492 508 498 498 498 478 480 484 456
Stack: 18 18 18 18 18 18 18 18 18 18 18 18
Time: 1493 1493 1484 1484 1488 1484 1491 1491 1482 1482 1484 1475
-------------------------------------------------------------------------------
realloc((void*)0,1)
Flash: 1180 1190 1156 1140 1172 1172 1052 1060 1052 1044 1056 1052
Stack: 20 22 20 18 22 20 20 22 20 18 22 20
Time: 300 307 301 293 311 304 277 284 280 272 289 279
-------------------------------------------------------------------------------
sprintf_min(s,"%d",12345)
Flash: 1266 1232 1280 1200 1204 1174 1126 1106 1150 1074 1076 1046
Stack: 51 51 54 53 59 53 51 51 54 53 59 53
Time: 1826 1821 1803 1809 1844 1807 1686 1686 1672 1677 1710 1673
-------------------------------------------------------------------------------
sprintf(s,"%d",12345)
Flash: 1698 1670 1696 1632 1664 1606 1518 1496 1518 1454 1490 1422
Stack: 54 54 57 57 58 57 54 54 57 57 58 57
Time: 1631 1626 1637 1616 1608 1619 1544 1545 1554 1535 1527 1535
-------------------------------------------------------------------------------
sprintf_flt(s,"%e",1.2345)
Flash: 3398 3312 3292 3254 3330 3206 3088 3032 2998 2960 3038 2936
Stack: 61 61 63 64 66 65 61 61 63 64 66 65
Time: 2509 2494 2500 2482 2513 2503 2280 2276 2282 2263 2297 2292
-------------------------------------------------------------------------------
sscanf_min("12345","%d",&i)
Flash: 1456 1444 1474 1474 1482 1500 1336 1328 1342 1344 1352 1366
Stack: 49 49 53 53 59 55 49 49 53 53 59 55
Time: 1679 1657 1625 1628 1623 1690 1388 1372 1344 1347 1341 1397
-------------------------------------------------------------------------------
sscanf("12345","%d",&i)
Flash: 1796 1768 1890 1872 1832 1874 1638 1614 1688 1674 1652 1674
Stack: 50 50 54 54 61 56 50 50 54 54 61 56
Time: 1713 1701 1699 1694 1739 1761 1430 1419 1422 1417 1451 1473
-------------------------------------------------------------------------------
sscanf_flt("1.2345","%e",&x)
Flash: 4122 4066 4188 4110 4210 4468 3790 3744 3796 3756 3848 4088
Stack: 124 124 126 128 140 132 124 124 126 128 140 132
Time: 3029 3019 3022 3031 3092 3171 2656 2651 2647 2659 2715 2784
-------------------------------------------------------------------------------
strtod("1.2345",&p)
Flash: 1534 1516 1666 1622 1622 1902 1434 1420 1526 1496 1496 1760
Stack: 20 20 20 22 22 20 20 20 20 22 22 20
Time: 1238 1234 1271 1268 1266 1294 977 975 1004 1005 1003 1032
-------------------------------------------------------------------------------
strtol("12345",&p,0)
Flash: 772 802 748 840 896 804 746 766 736 798 800 748
Stack: 16 16 17 17 25 21 16 20 21 21 21 21
Time: 896 918 863 915 999 904 675 692 649 689 700 663
-------------------------------------------------------------------------------
strtoul("12345",&p,0)
Flash: 742 740 786 752 804 866 716 720 820 794 778 824
Stack: 16 16 19 19 25 27 16 20 25 25 25 27
Time: 889 887 904 886 951 1019 668 673 740 726 726 772
===============================================================================
Summary
Flash: 23866 23644 24014 23746 24078 24834 21672 21536 21866 21646 21822 22518
Stack: 587 594 615 620 673 644 587 602 625 630 669 644
Time: 43023 42867 41178 41430 42193 42363 35267 35226 34394 34662 35209 35536
The full report of case: Experimental order of Andrew Hutchinson:
AVR: at90s8515__________________________ atmega8____________________________
GCC: 3.3.6 3.4.6 4.0.4 4.1.2 4.2.2 4.3.X 3.3.6 3.4.6 4.0.4 4.1.2 4.2.2 4.3.X
-------------------------------------------------------------------------------
bsearch("z",s,sizeof(s),1,cmp)
Flash: 280 274 266 266 266 268 216 214 208 208 208 212
Stack: 16 16 16 16 16 16 16 16 16 16 16 16
Time: 536 535 530 530 530 530 328 335 331 331 331 331
-------------------------------------------------------------------------------
dtostre(1.2345,s,6,0)
Flash: 998 996 1108 1098 1084 1088 930 928 1024 1002 994 994
Stack: 15 15 16 19 19 19 15 15 16 19 19 19
Time: 1196 1196 1291 1298 1296 1297 1057 1057 1124 1137 1136 1136
-------------------------------------------------------------------------------
dtostrf(1.2345,15,6,s)
Flash: 1632 1684 1650 1688 1650 1702 1474 1526 1526 1560 1520 1576
Stack: 37 38 39 38 39 39 37 38 39 38 39 39
Time: 1682 1619 1677 1627 1625 1638 1487 1436 1496 1458 1455 1470
-------------------------------------------------------------------------------
free(p)
Flash: 544 542 556 556 556 576 494 492 512 512 510 524
Stack: 4 4 4 4 4 6 4 4 4 4 4 6
Time: 228 223 227 227 229 238 209 205 210 210 211 218
-------------------------------------------------------------------------------
malloc(1)
Flash: 544 542 556 556 556 576 494 492 512 512 510 524
Stack: 4 4 4 4 4 6 4 4 4 4 4 6
Time: 194 189 193 193 195 204 176 172 177 177 178 185
-------------------------------------------------------------------------------
qsort(s,sizeof(s),1,cmp)
Flash: 1360 1328 1228 1230 1250 1538 1092 1084 1002 1004 1016 1284
Stack: 36 36 36 36 38 40 36 36 36 36 38 40
Time: 22627 22558 20782 21074 21514 20454 17570 17558 16602 16894 17278 15982
-------------------------------------------------------------------------------
rand()
Flash: 528 528 498 492 508 498 498 498 478 480 484 456
Stack: 18 18 18 18 18 18 18 18 18 18 18 18
Time: 1493 1493 1484 1484 1488 1484 1491 1491 1482 1482 1484 1475
-------------------------------------------------------------------------------
realloc((void*)0,1)
Flash: 1172 1164 1176 1152 1158 1194 1050 1038 1058 1046 1044 1076
Stack: 22 20 20 20 20 24 22 20 20 20 20 24
Time: 310 297 301 301 303 322 287 275 280 280 281 298
-------------------------------------------------------------------------------
sprintf_min(s,"%d",12345)
Flash: 1274 1240 1284 1204 1204 1174 1130 1110 1152 1076 1076 1046
Stack: 51 51 54 53 59 53 51 51 54 53 59 53
Time: 1838 1833 1813 1819 1844 1807 1692 1692 1677 1682 1710 1673
-------------------------------------------------------------------------------
sprintf(s,"%d",12345)
Flash: 1706 1678 1700 1636 1664 1606 1522 1500 1520 1456 1490 1422
Stack: 54 54 57 57 58 57 54 54 57 57 58 57
Time: 1643 1638 1647 1626 1608 1619 1550 1551 1559 1540 1527 1535
-------------------------------------------------------------------------------
sprintf_flt(s,"%e",1.2345)
Flash: 3422 3332 3296 3258 3324 3200 3100 3042 2996 2958 3032 2930
Stack: 61 61 63 64 66 65 61 61 63 64 66 65
Time: 2523 2508 2511 2493 2512 2502 2287 2283 2287 2268 2296 2291
-------------------------------------------------------------------------------
sscanf_min("12345","%d",&i)
Flash: 1464 1452 1496 1494 1494 1518 1340 1332 1358 1360 1360 1380
Stack: 49 49 53 53 59 55 49 49 53 53 59 55
Time: 1679 1657 1625 1628 1623 1690 1388 1372 1344 1347 1341 1397
-------------------------------------------------------------------------------
sscanf("12345","%d",&i)
Flash: 1810 1782 1906 1888 1844 1886 1648 1624 1702 1688 1660 1682
Stack: 50 50 54 54 61 56 50 50 54 54 61 56
Time: 1713 1701 1689 1694 1739 1756 1430 1419 1412 1417 1451 1468
-------------------------------------------------------------------------------
sscanf_flt("1.2345","%e",&x)
Flash: 4152 4090 4222 4134 4222 4492 3810 3760 3822 3774 3858 4110
Stack: 124 124 130 128 140 132 124 124 130 128 140 132
Time: 3031 3021 3041 3035 3094 3173 2658 2653 2670 2663 2717 2788
-------------------------------------------------------------------------------
strtod("1.2345",&p)
Flash: 1534 1516 1672 1622 1622 1902 1434 1420 1528 1496 1496 1764
Stack: 20 20 24 22 22 20 20 20 24 22 22 20
Time: 1238 1234 1288 1268 1266 1294 977 975 1025 1005 1003 1034
-------------------------------------------------------------------------------
strtol("12345",&p,0)
Flash: 772 802 748 840 900 804 746 766 736 798 800 748
Stack: 16 16 17 17 25 21 16 20 21 21 21 21
Time: 896 918 863 915 999 904 675 692 649 689 700 663
-------------------------------------------------------------------------------
strtoul("12345",&p,0)
Flash: 742 740 786 752 800 866 716 720 816 790 774 824
Stack: 16 16 19 19 25 27 16 20 25 25 25 27
Time: 889 887 904 886 951 1019 668 673 740 726 726 772
===============================================================================
Summary
Flash: 23934 23690 24148 23866 24102 24888 21694 21546 21950 21720 21832 22552
Stack: 593 592 624 622 673 654 593 600 634 632 669 654
Time: 43716 43507 41866 42098 42816 41931 35930 35839 35065 35306 35825 34716
End of list