tinycc-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Tinycc-devel] Simple peephole optimization


From: Samir Ribić
Subject: Re: [Tinycc-devel] Simple peephole optimization
Date: Sun, 24 May 2020 18:59:26 +0200

I agree that optimization needs to be optional. Not only because it slows down compilation time, but also
because of possible bugs (operators &&, || and ? generate jumps inside expressions and might rely
on longer generated code variants)

There are numerous patterns that can be simplified

1. This is mentioned in my first post. In a sequences like
5+a+b  (sum of two variables as a part of larger _expression_)
we remark patterns like
MOV ECX,[address]
ADD EAX,ECX
(other registers are also possible)
it can be replaced with
ADD EAX,[address]

Similary, there are replaceable patterns
MOV ECX,[address]
AND EAX,ECX

MOV ECX,[address]
OR EAX,ECX

MOV ECX,[address]
XOR EAX,ECX

MOV ECX,[address]
SUB EAX,ECX

2) Comparison expressions
a==800
is translated as
MOV EAX,[address]
CMP EAX,800
MOV EAX,0
SETE AL

we can save 2 bytes by replacing last two instructions with
SETE AL
MOVZX EAX,AL

if constant we compare with is larger than 255, we can save 2 more bytes by replacing
first two instructions with

CMP DWORD[address],800

3) Single instruction which occurs in almost every loop
i++
generates
MOV EAX,[address]
MOV ECX,EAX
INC EAX
MOV [address],EAX

and we can save 9 bytes by replacing it into

INC DWORD [address]

4) Conditional _expression_:
a==5?1:2

generates
    CMP EAX,5
    MOV EAX,0
    SETE AL
    CMP EAX,0
    JE S1
    JMP S2
S1: MOV EAX,2
    JMP S3
S2: MOV EAX,1
S3:

We can save 16 bytes for similar cases:
   CMP EAX,5
   JE S2
S1: MOV EAX,2
    JMP S3
S2: MOV EAX,1
S3:

5) Constant assignement
  a=10
  generates
  MOV EAX,10
  MOV [address],EAX

  One byte can be saved by replacing it with
  MOV DWORD [address],10
* but beware of conditional expressions

6) Function prologue generates
  PUSH EBP
  MOV EBP,ESP
  SUB ESP,8
  NOP

we can save 6 bytes with
  ENTER 0,8
(however code latency will increase from 3 to 12 cycles)

7) Constant and variable parameter passing to function
  p(2)
parameter is passed using two instructions
  MOV EAX,2
  PUSH EAX

One byte saving with single instruction
 PUSH 2

8) Variable is already in register:
Sometimes (for example in
a=_expression_;
p(a)

we can see the code
MOV [address],EAX
MOV EAX,[address]
Second instruction is redundant (6 bytes save) but ...
* beware of goto to the second instruction

On Sun, May 24, 2020 at 7:16 AM Christian Jullien <address@hidden> wrote:

Hi, there is certainly a tradeoff between compilation time and execution time.

A possible way to add this peephole optimization and possibly others we’ll find later it to include this code only if –O2 is supplied.

Now, removing 400 similar sequences like this is certainly something we would like to have and on other backend if it applies.

 

How long it makes tcc slower when compiling with and without this option?

 

Wdyt?

 

From: Tinycc-devel [mailto:tinycc-devel-bounces+eligis=address@hidden] On Behalf Of Samir Ribic
Sent: Sunday, May 24, 2020 01:09
To: address@hidden
Subject: [Tinycc-devel] Simple peephole optimization

 

Tcc was never intended to be a real optimizing compiler, because it is focused on compilation speed and memory savings. However, it seems that some improvements on generated code can be done, without significant compiler speed degradation.As tcc generates machine code, rather than assembly, pattern searching  for peephole optimization might be quite fast.

 

I remarked that sequences in 386 generated code like this (Intel syntax)

MOV ECX,[memory location]

ADD EAX,ECX

appear quite often, and they can be replaced with

ADD EAX, [memory location]

which saves two bytes. Together with variants using AND, OR, XOR, ADC, SUB and SBB instructions, ]iIn a generated code from tcc.c there are about 400 similar sequences.

 

To do the replacement In a file i386-gen.c I have changed the function

void gen_opi(int op) about 20 lines below the label gen_op8:.

.....

        } else {
            gv2(RC_INT, RC_INT);
            r = vtop[-1].r;
            fr = vtop[0].r;
            o((opc << 3) | 0x01);
            o(0xc0 + r + fr * 8);
// Added code******
            unsigned char * peep;
            peep=cur_text_section->data+ind;

// op with global var

            if (peep[-8] == 0x8B &&      /*   MOV reg,regm */
                ((peep[-7] & 0xC7) == 5) &&  /* modrm= [disp32] */
                ((peep[-7] & 0x38) == (peep[-1] & 0x38 ))  /* dest reg of first==srcreg of second */
               ) {
                peep[-8]=(opc << 3) | 0x03;  /* first instruction change mov to op and swap registers */
                peep[-7] &= 0xC7;   /* clear reg field of first instruction */
                peep[-7] |= r*8;    /* set reg field of first instruction */
                ind -= 2;           /* two bytes saved */
            }
// op with local var
            else {
            if (peep[-5] == 0x8B &&      /*   MOV reg,regm */
                ((peep[-4] & 0xC7) == 0x45) &&  /* modrm= [EBP+disp8] */
                ((peep[-4] & 0x38) == (peep[-1] & 0x38 ))  /* dest reg of first==srcreg of second */
               ) {
                peep[-5]=(opc << 3) | 0x03;  /* first instruction change mov to op and swap registers */
                peep[-4] &= 0xC7;   /* clear reg field of first instruction */
                peep[-4] |= r*8;    /* set reg field of first instruction */
                ind -= 2;           /* two bytes saved */             
            }

            }
   //*************************         
        }

 

 

_______________________________________________
Tinycc-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/tinycc-devel

reply via email to

[Prev in Thread] Current Thread [Next in Thread]