discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] Using volk kernels on basic operations of gr_comp


From: Douglas Geiger
Subject: Re: [Discuss-gnuradio] Using volk kernels on basic operations of gr_complex, in my own custom blocks.
Date: Sun, 28 Feb 2016 14:39:20 -0800

The phenomenon Sylvain is pointing at is basically the fact that as compilers improve, you should expect the 'optimized' proto-kernels to no longer have as dramatic an improvement compared with the generic ones. As to your question of 'is it worth it' - that comes down to a couple of things: for example - how much of an improvement do you require to be 'worth it' (i.e., how much is your time worth and/or how much of an performance improvement do you require for your application). Similarly, is it worth it to you to get cross-platform improvements (which is one of the features of VOLK)? Or, perhaps, is it worth it to you just to learn how to use VOLK?

A couple of thoughts here: in general, when I have a flowgraph that is not meeting my performance requirements, my first step is to do some course profiling (i.e. via gr-perf-monitorx) to determine if there is a single block that is my primary performance bottleneck. If so - that is the block I will concentrate on for optimizations (both via VOLK, and/or any algorithmic improvements - e.g. can I turn any run-time calculations into a look-up table calculated either at compile-time, or within the constructor).
 If there is not a clear bottleneck, then next I look a little deeper using perf/oprofile to look at what functions my flowgraph is spending a lot of time in: can I e.g. create a faster version of some primitive calculation that all my blocks use a lot, and therefore get a speed-up across many blocks which should translate into a fast over-all application.

 Finally, if I still need more improvements I would look at collecting many blocks together into a single, larger block. This is generally less desirable, since you now have a (more) application-specific block, and it becomes harder to re-use in later projects, but if you have performance requirements that drive you there, then it absolutely is an option. At this point you likely have multiple operations being done to your incoming samples, and it becomes easy to collect all of those into a single larger VOLK call (and from there, create a SIMD-ized proto-kernel that targets your particular platform). So, while re-usability of code drives you away from this scenario, it offers the greatest potential for performance improvements, and thus is where many applications with high performance requirements tend to gravitate towards. Ideally you can strike a balance between the two: i.e. have widely re-usable blocks, but with a set of operations inside them that you can take advantage of e.g. SIMD-ized function calls to make them high-performance. If you can craft the block to be widely re-usable for a certain class of things (e.g. look at how the OFDM blocks are setup to be easily re-configurable for the many ways an OFDM waveform can be crafted). In the long-run having more knobs to turn to customize your existing code base to deal with whatever new scenario you are looking at in 1/2/10 years from now is always better than a brittle solution that solves today's problem, but is difficult to re-use to deal with tomorrow's.

Hope that was helpful. If you are interested in learning more about how to use VOLK - certainly have a look at libvolk.org - the documentation is (I think) fairly good at introducing the concepts and intent, as well as how the API looks/works. And certainly don't be shy about asking more questions here.

 Good luck,
  Doug

On Sun, Feb 28, 2016 at 1:58 AM, Sylvain Munaut <address@hidden> wrote:
> Just wanted to ask the more experienced users if you think this idea is
> worth a shot, or the performance improvement will be marginal.

Performance improvement is vastly dependent of the operation you're doing.

You can get an idea of the improvement by comparing the volk-profile
output for the generic kernel (coded in pure C) and the sse/avx ones.

For instance, on my laptop : for some very simple one (like float
add), the generic is barely slower than simd. Most likely because it's
so simple than even the compiler itself was able to simdize it by
itself.
But for other things (like complex multiply), the SIMD version is 10x faster ...


Cheers,

   Sylvain

_______________________________________________
Discuss-gnuradio mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio



--
Doug Geiger
address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]