discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] Writing SIMD code with sse


From: Eric Blossom
Subject: Re: [Discuss-gnuradio] Writing SIMD code with sse
Date: Wed, 12 Dec 2007 10:58:22 -0800
User-agent: Mutt/1.5.17 (2007-11-01)

On Wed, Dec 12, 2007 at 11:51:20PM +0530, Rohit Garg wrote:
> Hi all,
> 
> I was following the separate discussion on this list about writing
> various trig functions using vector intrinsics. I googled for it. The
> top few results I got were for "old" processors when SIMD intrinsics
> were new. The gcc documentation (my version is 4.1.2) has a list of
> intrinsics but no description, not even one line per intrinsic.

I believe those are 1-to-1 with the actual machine instructions.
See the intel or AMD docs.

> As there is need to optimize the codebase for new processors (conroe,
> barcelona etc) any way, can you please point me to some real
> documenatation on the subject. I would really appreciate any help.

I'm not sure exactly what you're looking for.  Both intel and AMD
have manuals about optimizing code for their microarchitectures.
You'll find them somewhere on their developer sites.

Probably the biggest place that needs improvement is trig functions.
I suggest starting with sin(x), cos(x) and sincos(x) for x a scalar
float, and a related version that computes 4 in parallel for x a
vector of 4 floats.  I'd do two versions of each: SSE2 for x86 and
SSE2 for x86_64 (on the 64 you've got twice as many registers to work
with.)

We need them with something close to single-precision floating point
accuracy.  You'll need to figure out what input domain you're willing to
accept; I'd say at a minimum +/- 4*pi.

> As a related question, possibly a digression, given that these
> extensions are the  key to unlock full power of new processors and yet
> are rather low level (we are still writing trig funcs), is there any
> FLOSS library for simd math?

Not sure.  Please check it out and let us know what you find.
There is of course the ATLAS stuff (optimized BLAS).

Eric




reply via email to

[Prev in Thread] Current Thread [Next in Thread]