help-gplusplus
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

gcc 3.4.3 performance problem illustrated


From: Kenneth Massey
Subject: gcc 3.4.3 performance problem illustrated
Date: Sat, 30 Apr 2005 16:02:38 -0400
User-agent: Mozilla Thunderbird 1.0.2 (X11/20050429)

I was noticing significantly worse performance in some of my C++ codes compiled 
with gcc 3.4.3
as compared to gcc 3.3.4. I have boiled it down into one relatively short code 
that illustrates.
It seems to be an issue of excessive cache misses in certain pointer lookup 
operations in gcc
3.4.3 binaries. BTW, are there any tools to actually count cache misses?

If anyone has a few minutes to compile and run the following code, I would be 
interested in
knowing if you experience the same problems. I'm running AMD64 athlon 3200 with 
1024KB cache. I
compiled with

g++ -O3 -Wall -march=k8

Compiled with gcc 3.3.4 average run time: 2.0 seconds
Compiled with gcc 3.4.3 average run time: 2.9 seconds

I've noticed even more dramatic differences in larger codes that actually do 
something.

I would be interested in answering the following questions:

1) is this observed only on AMD64, or also x86 ?
2) how does gcc 4.0.0 do?
3) are there compiler options that would improve performance (none that I've 
tried did)
4) what changed between gcc 3.3 and 3.4 to cause this?

If you have any spare time, I think this is an interesting example, and worth 
the effort for
someone to figure out. I'm afraid my compiler expertise is not sufficient, so I 
am asking for
some help. Thanks.



Code:

// run time is anywhere from 33 to 50 % longer when compiled with gcc 3.4.3 
compared to 3.3.4
// compiled with g++ -O3 -Wall -march=k8     (same performance lag observed 
with -O2)
//
// Objects are created in a heirarchy of classes.
// When referenced, it seems that the pointer lookups
//    must cause more cache misses in gcc 3.4.3 binaries.

#include <stdio.h>
#include <vector>

class mytype_A {
 public:
  int id;
  mytype_A():id(0) {}
};

class mytype_B {
 public:
  mytype_A* A;
  mytype_B(mytype_A* p):A(p) {}
};

class mytype_C {
 public:
  mytype_B* B;
  mytype_C(mytype_B* p):B(p) {}
};


class mytype_D {
 public:
  // mytype_C* C[2];          // less performance difference if we use simple 
arrays
  std::vector<mytype_C*> C;
  int junk[3];                // affects performance (must cause cache misses)

 public:
  mytype_D(mytype_A* a0, mytype_A* a1) {
    //    C[0] = new mytype_C(new mytype_B(a0));
    //    C[1] = new mytype_C(new mytype_B(a0));
    C.push_back(new mytype_C(new mytype_B(a0)));
    C.push_back(new mytype_C(new mytype_B(a0)));
  }
};



int main() {
  int k = 5000;                    // run-time not linear in k
  mytype_A* A[k];
  mytype_D* D[k];
  for (int i=0;i<=k;i++)
    A[i] = new mytype_A();
  for (int i=0;i<k;i++)
    D[i] = new mytype_D(A[i],A[k-i]);    // intentionally make some pointers 
farther apart

  clock_t before = clock();

  int k0 = 0;
  for (int i=0;i<k;i++) {
    k0 = 0;
    for (int j=0;j<k;j++) {         // run through list of D's, and reference 
pointers
      mytype_D* d = D[j];
      if (d->C[0]->B->A->id)     k0++;
      if (d->C[1]->B->A->id)     k0++;
    }
  }
  printf("%d\n",k0);                // don't allow compiler to optimize away k0

  printf("time: %f\n",(double)(clock()-before)/CLOCKS_PER_SEC);

  return 0;
}

-- 
Kenneth Massey
http://www.masseyratings.com


reply via email to

[Prev in Thread] Current Thread [Next in Thread]