[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Discuss-gnuradio] Complex Short/INT16 type
Re: [Discuss-gnuradio] Complex Short/INT16 type
Wed, 9 Nov 2011 03:40:57 +0000
3 quick questions - first, does the cmake setup automatically turn on gcc
optimizations, i.e, with "-O3"? Second, is there anything to be gained (or
lost) by turning on "-ftree-vectorize" and "-funsafe-math-optimizations"?
Finally, is the gcc on E100 really CodeSourcery's arm-none-eabi-gcc (or an
upstream GNU version thereof)?
From: Nick Foster [mailto:address@hidden
Sent: Tuesday, November 08, 2011 4:10 PM
To: Nowlan, Sean; address@hidden
Subject: Re: [Discuss-gnuradio] Complex Short/INT16 type
On Tue, Nov 8, 2011 at 12:50 PM, Nowlan, Sean <address@hidden> wrote:
> So, what needs to be done? I noticed that there are already hooks for NEON in
> the volk library but no implementation (or very little... don't remember
Josh is putting together a little example that uses Volk in Gnuradio's core
blocks (add, subtract, etc.). This will eventually (hopefully) become the
replacement for much of the functionality in gnuradio-core.
We've been talking about this for a long time, and it should provide a pretty
major speedup on all platforms, but especially those for which the compiler
sucks (ARM being the worst offender). Josh's example should provide a framework
for you to work with while we get Volk fully integrated into Gnuradio "for
You can also always use Volk functions in your own custom dsp blocks to speed
them up. You can also just use Volk outside of Gnuradio if you like.
> My understanding of Orc is that it generates architecture-dependent vector
> processor instructions from an Orc abstraction language. Is integrating Orc
> into Volk for NEON as simple as linking into liborc with a compile switch
> indicating that we want NEON output? Are the smarts already built into the
> cmake build process?
Orc is actually a little cooler than that -- it's a runtime-compiled
architecture-independent vector assembly language. It's integrated as one
alternative architecture for implementing Volk functions. Volk has been set up
to automatically select the fastest implementation available for a given
function at runtime, so for the user it's as simple as #include <volk/volk.h>
and then volk_32f_x2_add_32f_a16(...) to implement an adder. Volk will
automatically choose the fastest implementation at runtime the first time the
function is invoked, after figuring out what architecture it's running on and
what implementations are available for that given function. If an Orc version
of a function is available, it will be automatically selected and the Orc code
will runtime-compile to vectorized NEON. You don't have to link against liborc
at all, just against libvolk. We don't have any native NEON in Volk -- we use
Orc to provide coverage on NEON platforms. We've found that Orc tends to be
around 90% as fast as good, hand-tuned assembly most of the time, and sometimes
faster. The reason we don't just use Orc for everything is that it's usually
possible to do a little better with careful optimization and compiler
intrinsics, and we were "gifted" a large library of well-optimized SSE DSP
routines to use.
> Can I drop Philip's _fff and _ccf filters into volk and hit "go?" (I know
> there's more nuance to it, but if the combination of integrating Orc code and
> NEON FIR filter code that's already written gets me 90% of the way there, I'd
> be VERY happy!
You can, but the _fff and _ccf filters are already implemented and working in
NEON. They were done by Phil before Volk was integrated, so they're written in
assembly in the filter core. They are also automatically selected at runtime,
so they should be "just working"
for you already. Eventually we'll pull the assembly implementations out and put
them into Volk.
If you send me your flowgraph, I'll take a look at it on an E100 and see if I
can get some things optimized.
> From: Nick Foster address@hidden
> Sent: Tuesday, November 08, 2011 1:27 PM
> To: address@hidden
> Cc: address@hidden; Nowlan, Sean
> Subject: Re: [Discuss-gnuradio] Complex Short/INT16 type
> Sean, with all the talk about optimization for ARM, the first thing I
> would do is start to integrate Volk with existing floating-point
> blocks. Stock GCC is very, very bad at vectorizing for the NEON SIMD
> unit -- even when hardware floating point is used in GCC, most float
> instructions end up allocated to the VFP rather than the NEON unit.
> You might find an easy 2x-3x improvement just by doing the heavy
> lifting in Volk rather than in C++. All of the Orc functions in Volk
> will work for NEON. There's no FIR filter in Orc right now (need to
> get accumulators working properly in Orc), but Philip Balister already
> wrote NEON FIR filter cores for the _fff and _ccf FIR filters.
> This isn't to say that short complex wouldn't be a useful addition to
> GR. Just that it's likely going to be more work than making use of the
> existing floating-point hardware the E100 already has.
> This is work that needs to be done anyway to make ARM platforms as
> useful as possible, and we (Josh, Phil, and I) are happy to help you
> optimize your application for E100 if you give us details on how your
> application works. We're putting together a "motivating example" using
> Volk to show users how to Volkify their own blocks.
> On Tue, Nov 8, 2011 at 9:13 AM, Josh Blum <address@hidden> wrote:
>> On 11/07/2011 02:15 PM, Nowlan, Sean wrote:
>>> Hi all -
>>> I'm getting limited by the slow ARM processor in the E100 and I want
>>> to modify parts of gr-digital and gnuradio-core to support complex
>>> short/INT16 types in the modulation schemes. I suspect that it won't
>>> be as trivial as defining "typedef std::complex<short> gr_complexs;"
>>> in gnuradio-core/src/lib/runtime/gr_complex.h and doing a
>> find-and-replace in the relevant source files. There are probably
>> It may be that simple for some blocks. Like the symbol table in BPSK.
>>> issues with dynamic range that I'll have to deal with in addition to
>>> having to implement filters using fixed-point math.
>> Often blocks will need to have scale factors. Fortunatly, with a FIR
>> filter, you get a free scale factor in the "filter taps"
>>> 1) Do you think I'd save anything by doing all the modulation &
>>> filtering in complex float32 and then converting at the very end?
>> Its good to make the conversion part of an operation that does
>> something useful rather than doing it for the sake of converting.
>> Like a filter that takes in floats and spits out shorts.
>>> This will reduce the bandwidth requirement to the FPGA by two, but
>>> I'm afraid the float math is the true limitation.
>> The format going into the FPGA is always integer. If you pass floats
>> into the UHD, they are copy-converted from host buffer to memory
>> mapped buffers.
>>> 2) Why is there a gr_complex_to_interleaved_short block but not
>>> a gr_complex_to_complex_short block? Would it be better if I rolled
>>> my own or just hooked up a gr_complex_to_interleaved_short block and
>>> then a deinterleave block? Or alternatively, split the complex float
>>> vector into two streams and feed them to a USRP sink block using
>> The interleaved short block is a strange hold-over from ancient
>> times. I would ignore it. I think a block such as
>> is a good idea.
>>> 3) What specific parts of the modulation examples or
>>> gnuradio-core do you think I need to change to support complex short
>> Probably some new sc16 filter blocks for the matched filters. I have
>> mentioned the importance of volk before.
>> The constellation stuff relies on this new constellation library in
>> gr-digital. Perhaps Ben can lean in here and offer some advice on how
>> to modify this for alternative data types.
>> The recovery stuff in the BPSK is using Tom's new gri-control-loop to
>> simplify writing things like FLLs, PLLs. Thats a place too look, see
>> how the timing recovery blocks make use of it.
>> Discuss-gnuradio mailing list