[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: strange FFT timing
From: |
David Bateman |
Subject: |
Re: strange FFT timing |
Date: |
Sun, 20 Apr 2008 22:01:07 +0200 |
User-agent: |
Thunderbird 2.0.0.12 (X11/20080306) |
Przemek Klosowski wrote:
> I would expect that FFT would be fasterst for datasets of the size
> that is a power of 2, but a quick test doesn't bear this out. I am
> using Octave Version 3.0.0 on Fedora 8 (2.6.24) runing on a 2.1 GHz
> Athlon XP 2800+ with 1GB of memory (no swap activity during the
> calculation). The 1,000,000 element FFT is almost twice as fast as
> 2^20 element FFT (see below for excerpt of the Octave session).
>
> Anyone would venture what is going on? It could be cache aliasing
> (this Athlon has a 512kB cache). In any case, FFTW seems to be very
> good handling the non-power-of-2 sizes, and seems
> to be optimized for power-of-ten sizes.
>
> v=rand(2^20,1); tic;fft(v);toc
> Elapsed time is 0.339144 seconds.
> Elapsed time is 0.290032 seconds.
> Elapsed time is 0.283474 seconds.
> Elapsed time is 0.291376 seconds.
> Elapsed time is 0.285946 seconds.
> v=rand(1e6,1); tic;fft(v);toc
> Elapsed time is 0.223884 seconds.
> Elapsed time is 0.179272 seconds.
> Elapsed time is 0.179442 seconds.
> Elapsed time is 0.184591 seconds.
> Elapsed time is 0.179012 seconds.
>
> Similar results obtained with v=rand(1e6,1);x=cputime;fft(v);cputime-x
At a guess I'd say SIMD memory alignment. With 16byte data alignment
FFTW can use the SSE2/SSE3 instructions to speed things up. Maybe your
1e6 case just got lucky. I see the reverse behavior..
D.