[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Discuss-gnuradio] complex dotprod speedup
From: |
Stephane Fillod |
Subject: |
Re: [Discuss-gnuradio] complex dotprod speedup |
Date: |
Sat, 20 Nov 2004 15:55:18 +0100 |
User-agent: |
Mutt/1.5.6+20040907i |
On Wed, Nov 10, 2004 at 04:42:51PM -0800, Eric Blossom wrote:
> Thanks to some serious SSE and 3DNow! hacking by Stephane Fillod, we
> now have a much faster version of the complex/complex/complex dot
> product function. This function is at the bottom of the
> gr.freq_xlating_fir_filter_ccc function. This function performs
> channel selection and digital downconversion on complex data. This is
> really handy when using the USRP, since we're dealing with complex
> data on the host.
>
> Below is a selection of benchmark results on different machines. One
> thing that I find interesting is the wide variation between the
> generic and SIMD times as a function of machine microarchitecture.
> The generic implementations are C++ with partial loop unrolling.
Thanks Eric for the results, I've posted them on the wiki
in the following page http://comsec.com/wiki?SIMDNumberCrunching
Here are mine on AMD Duron 1.0 GHz, which is looking okay when the
data-set does not exceed too much the tiny L2 cache (64KiB, doh!):
address@hidden tests]$ ./benchmark_dotprod_ccc
generic: taps: 256 input: 4e+07 cpu: 150.394 taps/sec: 6.809e+07
3DNow!Ext: taps: 256 input: 4e+07 cpu: 45.439 taps/sec: 2.254e+08
3DNow!: taps: 256 input: 4e+07 cpu: 49.981 taps/sec: 2.049e+08
SSE: taps: 256 input: 4e+07 cpu: 56.873 taps/sec: 1.8e+08
> taps = number of filter taps
> input = number of input samples
> cpu = combined user+sys cpu time
> taps/sec = derived performance measure. Higher is better.
>
>
> === This machine is a Pentium M (1.4 GHz) ===
>
> address@hidden tests]$ ./benchmark_dotprod_ccc
> generic: taps: 256 input: 4e+07 cpu: 121.476 taps/sec: 8.43e+07
> SSE: taps: 256 input: 4e+07 cpu: 39.010 taps/sec: 2.625e+08
>
> === This machine is an Athlon MP 1800+ (1.5 GHz) ===
>
> address@hidden tests]$ ./benchmark_dotprod_ccc
> generic: taps: 256 input: 4e+07 cpu: 118.090 taps/sec: 8.671e+07
> 3DNow!Ext: taps: 256 input: 4e+07 cpu: 29.705 taps/sec: 3.447e+08
> 3DNow!: taps: 256 input: 4e+07 cpu: 33.213 taps/sec: 3.083e+08
> SSE: taps: 256 input: 4e+07 cpu: 37.242 taps/sec: 2.75e+08
>
> === This machine is a Pentium 4 (1.7 GHz) ===
>
> address@hidden tests]$ ./benchmark_dotprod_ccc
> generic: taps: 256 input: 4e+07 cpu: 156.850 taps/sec: 6.529e+07
> SSE: taps: 256 input: 4e+07 cpu: 23.241 taps/sec: 4.406e+08