Re: FFT slowup in 2.1.55

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: FFT slowup in 2.1.55

From:	David Bateman
Subject:	Re: FFT slowup in 2.1.55
Date:	Tue, 2 Mar 2004 00:51:19 +0100
User-agent:	Mutt/1.4.1i

Well, I don't know when I'll get a real chance to look at this issue,
and it might not be before the next release. So could you apply the
attached patch that at least gets rid of the thrashing back and forth
between aligned and unaligned FFTW plans.

If I do something about the alignment before the next, this code should
probably stay in for a while anyway, as a sanity check on the alignment
code....

Regards
David

According to John W. Eaton <address@hidden> (on 02/27/04):
> On 27-Feb-2004, David Bateman <address@hidden> wrote:
> 
> | Ok, due to the alignment issues in FFTW 3.x I've found I can get
> | into the case where I'm calling exactly the same FFT, but the 
> | alignment changes each time I call the function. The planner in
> | oct-fftw.cc then recreates the plan each time, which is slow.
> | For small FFT's [I've tested fft(randn(64,45))], this results in 
> | a significant slowup.
> | 
> | I've tried two lines of attack to fix this problem. The first is to
> | try and force the 16-byte alignment in "class Array<T>" using
> | 
> | #ifdef HAVE_ATTRIB_ALIGN
> |     typedef T alignedT __attribute__ ((aligned(16)));
> | #else
> |     typedef T alignedT;
> | #endif
> | 
> | where HAVE_ATTRIB_ALIGN is a configure time option. I then replace the
> | "new T" in ArrayRep with "new alignedT". This appears to only give me
> | 8-byte alignment, but I've read in the gcc manual that ld might not be
> | able to do better than 8-byte alignment, so it is not clear to me if
> | 
> | 1) The code I've done is correct, but the linker won't give me 16-byte
> | alignment, or
> | 2) I've missing somewhere which also needs 16-byte alignment.
> | 
> | Does anyone have any ideas on this?
> 
> Doesn't the __attribute__ ((aligned(16)) qualifier apply to variables,
> not types?  It seems to me that attaching it to a typedef would have
> no effect.
> 
> In any case, the variable you want to align in the Array classes
> doesn't actually exist in the class.  We only have a pointer to it,
> and you don't really want the pointer to be aligned on a 16-bit
> boundary, you want the object that it points to to be aligned.  So I
> think you have to write an allocator that will do that for you.  So
> instead of writing
> 
>   explicit ArrayRep (int n) : 
>     data (new alignedT [n]), len (n), count (1) { }
> 
> I think it might work to write
> 
>   explicit ArrayRep (int n) : 
>     data (make_aligned_16_double_array (n)), len (n), count (1) { }
> 
> where "make_aligned_16_double_array" is a function that returns a
> pointer to double which has the elements of the allocated array
> aligned on a 16-byte boundary.  To do that, you will probably need to
> allocate a char array of the appropriate size (might require up to 15
> bytes of padding) then return a pointer to one of the first 15
> elements with a cast to make it a pointer to double instead of a
> pointer to char.
> 
> But the solution above won't quite work because ArrayRep is a template
> class.  So we need to do this only for some values of the template
> type paramater T (double and maybe Complex).  Some method of
> specializing these functions is needed.  Here is a simplified example
> that I think should work (I leave it up to you to determine how to
> compute the necessary offset so that the data pointed to by the pointer
> returned from the make_aligned_16_double_array function is actually on
> a 16-byte boundary).
> 
> #include <iostream>
> 
> template <class T>
> class
> Array
> {
> public:
> 
>   class ArrayRep
>   {
>   public:
> 
>     T *data;
>     int len;
> 
>     explicit ArrayRep (int n)
>       : data (new T [n]), len (n) { std::cerr << "generic" << std::endl; }
>   };
> 
>   ArrayRep *rep;
> 
>   explicit Array (int n)
>     : rep (new typename Array<T>::ArrayRep (n)) { }
> };
> 
> double *
> make_aligned_16_double_array (int n)
> {
>   // Do something magic to find the required offset (should be in the
>   // range [0, 15]. 
>   int offset = 0;
> 
>   char *buf = new char [n * sizeof (double) + offset];
> 
>   return reinterpret_cast<double *> (&buf[offset]);
> }
> 
> template <>
> Array<double>::ArrayRep::ArrayRep (int n)
>   : data (make_aligned_16_double_array (n)), len (n)
> {
>   std::cerr << "double" << std::endl;
> }
> 
> int
> main (void)
> {
>   Array<int> int_ra (2);
>   Array<double> double_ra (2);
> }
> 
> Or am I missing a simpler solution?
> 
> jwe

-- 
David Bateman                                address@hidden
Motorola CRM                                 +33 1 69 35 48 04 (Ph) 
Parc Les Algorithmes, Commune de St Aubin    +33 1 69 35 77 01 (Fax) 
91193 Gif-Sur-Yvette FRANCE

The information contained in this communication has been classified as: 

[x] General Business Information 
[ ] Motorola Internal Use Only 
[ ] Motorola Confidential Proprietary

patch
Description: Text document

[Prev in Thread]

Current Thread

[Next in Thread]

Re: FFT slowup in 2.1.55, David Bateman, 2004/03/01
- Re: FFT slowup in 2.1.55, David Bateman <=
  - Re: FFT slowup in 2.1.55, John W. Eaton, 2004/03/02

Prev by Date: Re: error saving empty matrix in HDF5 format
Next by Date: Re: error saving empty matrix in HDF5 format
Previous by thread: Re: FFT slowup in 2.1.55
Next by thread: Re: FFT slowup in 2.1.55
Index(es):
- Date
- Thread