discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] complex dotprod speedup


From: Stephane Fillod
Subject: Re: [Discuss-gnuradio] complex dotprod speedup
Date: Sat, 20 Nov 2004 15:55:18 +0100
User-agent: Mutt/1.5.6+20040907i

On Wed, Nov 10, 2004 at 04:42:51PM -0800, Eric Blossom wrote:
> Thanks to some serious SSE and 3DNow! hacking by Stephane Fillod, we
> now have a much faster version of the complex/complex/complex dot
> product function.  This function is at the bottom of the
> gr.freq_xlating_fir_filter_ccc function.  This function performs
> channel selection and digital downconversion on complex data.  This is
> really handy when using the USRP, since we're dealing with complex
> data on the host.
> 
> Below is a selection of benchmark results on different machines.  One
> thing that I find interesting is the wide variation between the
> generic and SIMD times as a function of machine microarchitecture.
> The generic implementations are C++ with partial loop unrolling.

Thanks Eric for the results, I've posted them on the wiki 
in the following page http://comsec.com/wiki?SIMDNumberCrunching

Here are mine on AMD Duron 1.0 GHz, which is looking okay when the 
data-set does not exceed too much the tiny L2 cache (64KiB, doh!):

address@hidden tests]$ ./benchmark_dotprod_ccc 

   generic: taps:  256  input: 4e+07  cpu: 150.394 taps/sec:  6.809e+07
 3DNow!Ext: taps:  256  input: 4e+07  cpu: 45.439  taps/sec:  2.254e+08
    3DNow!: taps:  256  input: 4e+07  cpu: 49.981  taps/sec:  2.049e+08
       SSE: taps:  256  input: 4e+07  cpu: 56.873  taps/sec:    1.8e+08


> taps     = number of filter taps
> input    = number of input samples
> cpu      = combined user+sys cpu time
> taps/sec = derived performance measure.  Higher is better.
> 
> 
> === This machine is a Pentium M (1.4 GHz) ===
> 
> address@hidden tests]$ ./benchmark_dotprod_ccc
>    generic: taps:  256  input: 4e+07  cpu: 121.476  taps/sec:   8.43e+07
>        SSE: taps:  256  input: 4e+07  cpu:  39.010  taps/sec:  2.625e+08
> 
> === This machine is an Athlon MP 1800+ (1.5 GHz) ===
> 
> address@hidden tests]$ ./benchmark_dotprod_ccc
>    generic: taps:  256  input: 4e+07  cpu: 118.090  taps/sec:  8.671e+07
>  3DNow!Ext: taps:  256  input: 4e+07  cpu:  29.705  taps/sec:  3.447e+08
>     3DNow!: taps:  256  input: 4e+07  cpu:  33.213  taps/sec:  3.083e+08
>        SSE: taps:  256  input: 4e+07  cpu:  37.242  taps/sec:   2.75e+08
> 
> === This machine is a Pentium 4 (1.7 GHz) ===
> 
> address@hidden tests]$ ./benchmark_dotprod_ccc
>    generic: taps:  256  input: 4e+07  cpu: 156.850  taps/sec:  6.529e+07
>        SSE: taps:  256  input: 4e+07  cpu:  23.241  taps/sec:  4.406e+08




reply via email to

[Prev in Thread] Current Thread [Next in Thread]