[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Discuss-gnuradio] Complex Short/INT16 type
Re: [Discuss-gnuradio] Complex Short/INT16 type
Tue, 08 Nov 2011 20:00:10 -0800
Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1
On 11/08/2011 07:40 PM, Nowlan, Sean wrote:
> 3 quick questions - first, does the cmake setup automatically turn on
> gcc optimizations, i.e, with "-O3"? Second, is there anything to be
> gained (or lost) by turning on "-ftree-vectorize" and
> "-funsafe-math-optimizations"? Finally, is the gcc on E100 really
> CodeSourcery's arm-none-eabi-gcc (or an upstream GNU version
CMake will automatically build in release mode, which gives you -03.
Other important flags need to be specified, you can do this in one fell
swoop with a toolchain file. Once is checked into the cmake/Toolchains
directory, see comments for usage
> Thanks, Sean
> -----Original Message----- From: Nick Foster [mailto:address@hidden
> Sent: Tuesday, November 08, 2011 4:10 PM To: Nowlan, Sean;
> address@hidden Subject: Re: [Discuss-gnuradio] Complex Short/INT16
> On Tue, Nov 8, 2011 at 12:50 PM, Nowlan, Sean
> <address@hidden> wrote:
>> So, what needs to be done? I noticed that there are already hooks
>> for NEON in the volk library but no implementation (or very
>> little... don't remember exactly).
> Josh is putting together a little example that uses Volk in
> Gnuradio's core blocks (add, subtract, etc.). This will eventually
> (hopefully) become the replacement for much of the functionality in
> gnuradio-core. We've been talking about this for a long time, and it
> should provide a pretty major speedup on all platforms, but
> especially those for which the compiler sucks (ARM being the worst
> offender). Josh's example should provide a framework for you to work
> with while we get Volk fully integrated into Gnuradio "for real".
> You can also always use Volk functions in your own custom dsp blocks
> to speed them up. You can also just use Volk outside of Gnuradio if
> you like.
>> My understanding of Orc is that it generates architecture-dependent
>> vector processor instructions from an Orc abstraction language. Is
>> integrating Orc into Volk for NEON as simple as linking into liborc
>> with a compile switch indicating that we want NEON output? Are the
>> smarts already built into the cmake build process?
> Orc is actually a little cooler than that -- it's a runtime-compiled
> architecture-independent vector assembly language. It's integrated as
> one alternative architecture for implementing Volk functions. Volk
> has been set up to automatically select the fastest implementation
> available for a given function at runtime, so for the user it's as
> simple as #include <volk/volk.h> and then
> volk_32f_x2_add_32f_a16(...) to implement an adder. Volk will
> automatically choose the fastest implementation at runtime the first
> time the function is invoked, after figuring out what architecture
> it's running on and what implementations are available for that given
> function. If an Orc version of a function is available, it will be
> automatically selected and the Orc code will runtime-compile to
> vectorized NEON. You don't have to link against liborc at all, just
> against libvolk. We don't have any native NEON in Volk -- we use Orc
> to provide coverage on NEON platforms. We've found that Orc tends to
> be around 90% as fast as good, hand-tuned assembly most of the time,
> and sometimes faster. The reason we don't just use Orc for everything
> is that it's usually possible to do a little better with careful
> optimization and compiler intrinsics, and we were "gifted" a large
> library of well-optimized SSE DSP routines to use.
>> Can I drop Philip's _fff and _ccf filters into volk and hit "go?"
>> (I know there's more nuance to it, but if the combination of
>> integrating Orc code and NEON FIR filter code that's already
>> written gets me 90% of the way there, I'd be VERY happy!
> You can, but the _fff and _ccf filters are already implemented and
> working in NEON. They were done by Phil before Volk was integrated,
> so they're written in assembly in the filter core. They are also
> automatically selected at runtime, so they should be "just working"
> for you already. Eventually we'll pull the assembly implementations
> out and put them into Volk.
> If you send me your flowgraph, I'll take a look at it on an E100 and
> see if I can get some things optimized.
>> Thanks, Sean ________________________________________ From: Nick
>> Foster address@hidden Sent: Tuesday, November 08, 2011 1:27 PM
>> To: address@hidden Cc: address@hidden; Nowlan, Sean
>> Subject: Re: [Discuss-gnuradio] Complex Short/INT16 type
>> Sean, with all the talk about optimization for ARM, the first thing
>> I would do is start to integrate Volk with existing floating-point
>> blocks. Stock GCC is very, very bad at vectorizing for the NEON
>> SIMD unit -- even when hardware floating point is used in GCC, most
>> float instructions end up allocated to the VFP rather than the NEON
>> unit. You might find an easy 2x-3x improvement just by doing the
>> heavy lifting in Volk rather than in C++. All of the Orc functions
>> in Volk will work for NEON. There's no FIR filter in Orc right now
>> (need to get accumulators working properly in Orc), but Philip
>> Balister already wrote NEON FIR filter cores for the _fff and _ccf
>> FIR filters.
>> This isn't to say that short complex wouldn't be a useful addition
>> to GR. Just that it's likely going to be more work than making use
>> of the existing floating-point hardware the E100 already has.
>> This is work that needs to be done anyway to make ARM platforms as
>> useful as possible, and we (Josh, Phil, and I) are happy to help
>> you optimize your application for E100 if you give us details on
>> how your application works. We're putting together a "motivating
>> example" using Volk to show users how to Volkify their own blocks.
>> On Tue, Nov 8, 2011 at 9:13 AM, Josh Blum <address@hidden> wrote:
>>> On 11/07/2011 02:15 PM, Nowlan, Sean wrote:
>>>> Hi all -
>>>> I'm getting limited by the slow ARM processor in the E100 and I
>>>> want to modify parts of gr-digital and gnuradio-core to support
>>>> complex short/INT16 types in the modulation schemes. I suspect
>>>> that it won't be as trivial as defining "typedef
>>>> std::complex<short> gr_complexs;" in
>>>> gnuradio-core/src/lib/runtime/gr_complex.h and doing a
>>> find-and-replace in the relevant source files. There are
>>> It may be that simple for some blocks. Like the symbol table in
>>>> issues with dynamic range that I'll have to deal with in
>>>> addition to having to implement filters using fixed-point
>>> Often blocks will need to have scale factors. Fortunatly, with a
>>> FIR filter, you get a free scale factor in the "filter taps"
>>>> 1) Do you think I'd save anything by doing all the
>>>> modulation & filtering in complex float32 and then converting
>>>> at the very end?
>>> Its good to make the conversion part of an operation that does
>>> something useful rather than doing it for the sake of
>>> converting. Like a filter that takes in floats and spits out
>>>> This will reduce the bandwidth requirement to the FPGA by two,
>>>> but I'm afraid the float math is the true limitation.
>>> The format going into the FPGA is always integer. If you pass
>>> floats into the UHD, they are copy-converted from host buffer to
>>> memory mapped buffers.
>>>> 2) Why is there a gr_complex_to_interleaved_short block
>>>> but not a gr_complex_to_complex_short block? Would it be better
>>>> if I rolled my own or just hooked up a
>>>> gr_complex_to_interleaved_short block and then a deinterleave
>>>> block? Or alternatively, split the complex float vector into
>>>> two streams and feed them to a USRP sink block using
>>> The interleaved short block is a strange hold-over from ancient
>>> times. I would ignore it. I think a block such as
>>> "gr_complex_to_complex_short" is a good idea.
>>>> 3) What specific parts of the modulation examples or
>>>> gnuradio-core do you think I need to change to support complex
>>>> short ints?
>>> Probably some new sc16 filter blocks for the matched filters. I
>>> have mentioned the importance of volk before.
>>> The constellation stuff relies on this new constellation library
>>> in gr-digital. Perhaps Ben can lean in here and offer some advice
>>> on how to modify this for alternative data types.
>>> The recovery stuff in the BPSK is using Tom's new
>>> gri-control-loop to simplify writing things like FLLs, PLLs.
>>> Thats a place too look, see how the timing recovery blocks make
>>> use of it.
>>> _______________________________________________ Discuss-gnuradio
>>> mailing list address@hidden
> _______________________________________________ Discuss-gnuradio
> mailing list address@hidden