octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

malloc/erase


From: Paul Thomas
Subject: malloc/erase
Date: Mon, 29 Mar 2004 20:53:44 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225

The following is an exchange of messages between Paul Kienzle and Paul Thomas 
concerning comparisons between new, malloc and stl vector for allocating memory.


On Mon, Mar 29, 2004 at 01:47:00PM +0200, THOMAS Paul Richard wrote:

Paul,

Two things that seem surprising to me:

1) On a 2.5GHz pentium with Windows 2000:

         cygwin32        11.17s (using clock() )
         Ccygwin32       10.54s

which as factor of two or so slower than the Athlon1700 with XP.  I wonder
if this indicates a dependence on Windows version, as well?

That wouldn't surprise me.


2) Replacing new+delete or malloc+erase with:

   vector<double> myvec(1);
   double *myarray = &myvec[0];
   if ( myarray==NULL) { printf("alloc failed\n");exit(1);}
   else {myarray=NULL; myvec.clear()}

vector throws an exception if not enough memory, so myarray
will never be NULL, except maybe if the array is of length 0.
There is no reason to call clear since the destructor will
clear the data when it is done.

This is the code I'm using for timing:

#include <vector>
using namespace std;

int main()
{
 for (int iloop = 0; iloop < 10000000; iloop++)
 {
   vector<double> myvec(1);
   double *myarray = &myvec[0];
 }
 return 0;
}



is ten times faster than either (g++ -O2)

     Vcygwin32      1.094s

Taking at face value the standard library guarantee to automatically
destruct the resources of containers, when going out of scope, and
eliminating the clear(), drops this time to 0.469s.  I wonder how this is
possible, when vector is presumably doing the same as new or malloc?

I'm a little worried a clever optimizer will eliminate
most of this loop.  We really ought to be assigning
to the vector and creating a total.  This will also show
us how significant the cost of alloc is compared to e.g., a trig function. However, I already have lots
of data using the old method, so I'll stick with it for now.

I ran this test on a few other boxen.  vector is nearer
in speed to malloc than to new on all of them:

         vec     new     alloc
Linux    3.39    4.92    3.17
IRIX     3.03    4.15    2.21
IRIXgcc  2.94    4.28    2.23
Mac10.3  9.13   11.22    9.80
ming32   3.02   14.45   12.65
cyg32    3.90   16.17   14.35
ming33          18.60   12.27
cyg33           72.04   24.34

Notice that the results for vector on Linux are right in line with the values on slightly slower windows machine, so I'm
inclined to accept them as reasonable.

Reading through /usr/include/c++/3.2/bits/stl_alloc.h, they have this
to say:

   __malloc_alloc_template

        A malloc-based allocator.  Typically slower than the
        __default_alloc_template.  Typically thread safe and
        more storage efficient.

   __default_alloc_template:

        Default node allocator.  Uses __mem_interface for its
        underlying requests (and makes as few requests as possible).    



Should we be using stl vectors in ArrayRep and so on, or at least copying
the content of stl_vector.h and modifying it for octave?

This is a question for John and David (who is working on memory
alignment for FFTW).

Given the speedup on Windows I'm all for it, especially since it is a win over new[] everywhere I tested. Make sure though that we don't take a big hit in memory efficiency.


Paul Kienzle
address@hidden


--- Begin Message --- Subject: RE: malloc/erase Date: Mon, 29 Mar 2004 17:41:09 +0200
Paul,

The reason is quite simple; I was formally bashed around the head for
subscribing to the lists from a CEA computer.  On top of that, access to my
personal ISP has been blocked, as of last week.  All this will change
totally if ITER comes here and fusion is displaced to the other side of the
fence.  For now, though, we just have to put up with the security measures.

The plan was to post it to the list, from home, this evening.  I was hoping
as well that you would have put me right if the part about vectors was off
kilter - I still feel as if I am seriously out of my depth a lot of the
time.  To paraphrase Abe Lincoln, "It is better to be thought a fool than to
post on the list and remove all doubt."

I just tried the comparison on a Compaq Tru64 system; there the stl method
is slower than both new and malloc.  I cannot quantify because clock() seems
to be broken in gcc.

Regards

Paul
  

-----Message d'origine-----
De : Paul Kienzle [mailto:address@hidden
Envoyé : lundi 29 mars 2004 17:27
À : THOMAS Paul Richard
Cc : address@hidden; address@hidden
Objet : Re: malloc/erase


Paul, 

I'm CC'ing to John and David --- I'm not sure why you are not 
posting this to the list, other than better response time from me
of course (I only read the list in the evening 8-)

On Mon, Mar 29, 2004 at 01:47:00PM +0200, THOMAS Paul Richard wrote:
> Paul,
> 
> Two things that seem surprising to me:
> 
> 1) On a 2.5GHz pentium with Windows 2000:
> 
>          cygwin32        11.17s (using clock() )
>          Ccygwin32       10.54s
> 
> which as factor of two or so slower than the Athlon1700 with XP.  I wonder
> if this indicates a dependence on Windows version, as well?

That wouldn't surprise me.

> 2) Replacing new+delete or malloc+erase with:
> 
>    vector<double> myvec(1);
>    double *myarray = &myvec[0];
>    if ( myarray==NULL) { printf("alloc failed\n");exit(1);}
>    else {myarray=NULL; myvec.clear()}

vector throws an exception if not enough memory, so myarray
will never be NULL, except maybe if the array is of length 0.
There is no reason to call clear since the destructor will
clear the data when it is done.

This is the code I'm using for timing:

#include <vector>
using namespace std;

int main()
{
  for (int iloop = 0; iloop < 10000000; iloop++)
  {
    vector<double> myvec(1);
    double *myarray = &myvec[0];
  }
  return 0;
}

> 
> is ten times faster than either (g++ -O2)
> 
>      Vcygwin32      1.094s
> 
> Taking at face value the standard library guarantee to automatically
> destruct the resources of containers, when going out of scope, and
> eliminating the clear(), drops this time to 0.469s.  I wonder how this is
> possible, when vector is presumably doing the same as new or malloc?

I'm a little worried a clever optimizer will eliminate
most of this loop.  We really ought to be assigning
to the vector and creating a total.  This will also show
us how significant the cost of alloc is compared to 
e.g., a trig function.  However, I already have lots
of data using the old method, so I'll stick with it for now.

I ran this test on a few other boxen.  vector is nearer
in speed to malloc than to new on all of them:

         vec     new     alloc
Linux    3.39    4.92    3.17
IRIX     3.03    4.15    2.21
IRIXgcc  2.94    4.28    2.23
Mac10.3  9.13   11.22    9.80
ming32   3.02   14.45   12.65
cyg32    3.90   16.17   14.35
ming33          18.60   12.27
cyg33           72.04   24.34

Notice that the results for vector on Linux are right in line 
with the values on slightly slower windows machine, so I'm
inclined to accept them as reasonable.

Reading through /usr/include/c++/3.2/bits/stl_alloc.h, they have this
to say:

    __malloc_alloc_template

        A malloc-based allocator.  Typically slower than the
        __default_alloc_template.  Typically thread safe and
        more storage efficient.

    __default_alloc_template:

        Default node allocator.  Uses __mem_interface for its
        underlying requests (and makes as few requests as possible).    

> 
> Should we be using stl vectors in ArrayRep and so on, or at least copying
> the content of stl_vector.h and modifying it for octave?

This is a question for John and David (who is working on memory
alignment for FFTW).

Given the speedup on Windows I'm all for it, especially 
since it is a win over new[] everywhere I tested.  Make 
sure though that we don't take a big hit in memory efficiency.


Paul Kienzle
address@hidden


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]