octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: conv2 performance


From: John Swensen
Subject: Re: conv2 performance
Date: Mon, 1 Mar 2010 09:17:38 -0500

On Feb 28, 2010, at 10:53 AM, John W. Eaton wrote:
> 
> 
> Maybe there is still room for improvement here.  I would happily use a
> free software library with a GPL-compatible license to implement this
> function, but I don't know whether one is available.
> 
> jwe
> 

I have recently been doing a lot of 2D convolutions.  I think the fastest 
method should not involve loops of any kind.  As convolution in the time domain 
(or spatial domain when considering images) is multiplication in the frequency 
domain, the fastest method is to take the FFT of both image and kernel, 
dot-multiply them, then take the inverse FFT.  Since the FFT method is usually 
provided by FFTW, this should be optimized and quite fast.  Of course, there 
has to be some padding that takes place to make sure both 'images' are the same 
size.  I was using Matlab for this computation and the speed improvement of the 
FFT method over the Matlab-provided conv2 was considerable (100 seconds versus 
2 second; I was convolving a 2048x2048 image with a 256x256 kernel).

I think the method is formally called the overlap-add method 
(http://en.wikipedia.org/wiki/Overlap-add_method).  I used a script from 
MatlabCentral (no flaming please, as I already saw the discussion that has been 
going on for a week or two).  This is the method used for many GPGPU 
implementations.  There is an in-depth description of the best way to do the 
padding in an NVidia white paper that can be found at
http://developer.download.nvidia.com/compute/cuda/sdk/website/projects/convolutionFFT2D/doc/convolutionFFT2D.pdf

John


reply via email to

[Prev in Thread] Current Thread [Next in Thread]