[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Optimization: why don't I get no speedup?
From: |
Paul E. Johnson |
Subject: |
Optimization: why don't I get no speedup? |
Date: |
Mon, 07 Jan 2002 12:58:49 -0600 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.6) Gecko/20011120 |
While profiling an application written in Obj-C, I trace a bottleneck
back to one specific method. It is a method that updates a lattice, and
it works by beginning at a point, and then taking cells within a
"radius" and updating them.
If the model does not "wrap around" the edges of the grid, there is some
wasted effort, because the for loop checks each time to see if a cell is
"out of bounds" and if it is out, then it "continues" and goes on to the
next cell. I guessed that this was wasting time, because it uses a lot
of steps to do nothing when the radius is very large, and tried to
redesign the algorithm so that the boundaries of the radius are
calculated first and then the loop iterates from edge to edge of the
legitimate radius. This eliminates the need inside the loop to check to
see if it is inside the grid and also cuts out visits to cells outside
the grid.
So I rewrote the alorithm, and got no speedup! Why? Are logic checks
just so fast that algorithmically eliminating them has no impact? I
have -O2 in the cflags, nothing special to try to speed things up.
Maybe gcc3 is being my friend here?
I'm going to show the code snip below, but I'm just bringing this up to
say that optimization is a drag when you get no speedup out of it!
And to ask the musical question: How long do you stare at a method like
this before you conclude that it is as fast as you can possibly make it?
Here is the "old method". The macro D2D is the same as the Swarm
discrete2dsiteAt() macro in the Grid2d class. I just got tired of typing
that over and over so relabeled it as D2D. The loop goes to cell
(xx,yy) and then broadcasts out from there, looping over the radius
around it.
- stepRule
{
unsigned xx, yy;
for (xx = 0; xx < xsize; xx++)
for (yy = 0; yy < ysize; yy++)
{
int i, levelHere =0;
//first flush through the population changes
for ( i = 0; i < OPINIONS; i++)
{
if ( (levelHere=*D2D(popChange[i],gridOffsets,xx,yy)) != 0)
{
int r, c, xcoord, ycoord;
BOOL outOfBoundsX=NO, outOfBoundsY=NO;
//broadcast that change across a square that goes
"radius" units in all directions from xx,yy
for(r=0;r<(2*radius)+1;r++) //r is for row
{
xcoord = xx+r-radius;
outOfBoundsX = (xcoord < 0)||(xcoord > xsize-1);//happens
often if radius is big
if (outOfBoundsX)
{
if (wrapAround == NO) continue; //if outside bounds,
then quit here
else xcoord = (xcoord + xsize) % xsize;
}
for(c=0;c<(2*radius)+1;c++) //c is for column
{
ycoord= yy+c-radius;
outOfBoundsY = (ycoord < 0)||(ycoord > ysize-1);
if (outOfBoundsY)
{
if (wrapAround == NO) continue; //if outside
bounds, then quit here
else
{
ycoord = (ycoord + ysize) % ysize;
}
}
*D2D(visibleOpinion[i],gridOffsets,xcoord,ycoord)
+= levelHere;
}
}
}
}
Here is the new faster more efficient code that is no damn faster!
- stepRule
{
unsigned xx, yy;
for (xx = 0; xx < xsize; xx++){
int leftX = xx - radius;
int rightX = xx + radius;
if (wrapAround == NO)
{
if ( leftX < 0 ) leftX = 0;
if ( rightX > xsize - 1 ) rightX = xsize - 1;
}
for (yy = 0; yy < ysize; yy++)
{
int i, levelHere =0;
int lowY = yy - radius;
int highY = yy + radius;
if (wrapAround == NO)
{
if ( lowY < 0 ) lowY = 0;
if ( highY > ysize -1) highY = ysize - 1;
}
//first flush through the population changes
for ( i = 0; i < OPINIONS; i++)
{
if ( (levelHere = *D2D(popChange[i],gridOffsets,xx,yy)) != 0)
{
int xcoord, ycoord;
if (wrapAround == NO)
{
for( xcoord = leftX; xcoord < rightX+1; xcoord++ )
{
for( ycoord=lowY; ycoord < highY+1; ycoord++)
{
*D2D(visibleOpinion[i],gridOffsets,xcoord,ycoord) +=
levelHere;
}
}
}
else
{
int r, c ; //r for row, c for column
for(r = leftX; r < rightX+1; r++)
{
xcoord = r;
if (r < 0) xcoord = xsize + r; //avoid use of slow
(xcoord+xsize)%xsize!
else if (r > xsize - 1) xcoord = r - xsize; // ditto !
for (c = lowY; c < highY+1 ; c++)
{
ycoord = c;
if (c < 0) ycoord = ysize + c; //avoid use of slow
(xcoord+xsize)%xsize!
else if (c > ysize - 1) ycoord = c - ysize; //
ditto !
*D2D(visibleOpinion[i],gridOffsets,xcoord,ycoord) +=
levelHere;
}
}
}
}
}
--
Paul E. Johnson email: address@hidden
Dept. of Political Science http://lark.cc.ukans.edu/~pauljohn
University of Kansas Office: (785) 864-9086
Lawrence, Kansas 66045 FAX: (785) 864-5700
==================================
Swarm-Support is for discussion of the technical details of the day
to day usage of Swarm. For list administration needs (esp.
[un]subscribing), please send a message to <address@hidden>
with "help" in the body of the message.
- Optimization: why don't I get no speedup?,
Paul E. Johnson <=