It looks like the ofdm blocks are using a fair amount of multiply and
multiply const blocks:
gr-digital$ find -name "*ofdm*" | xargs grep mult
./python/ofdm_sync_ml.py: self.mixer = gr.multiply_cc();
./python/ofdm_sync_ml.py: # The output theta of the correlator above is
multiplied with this correlation to
./python/ofdm_sync_ml.py: self.mul = gr.multiply_ff()
./python/ofdm_receiver.py: self.chan_filt = gr.multiply_const_cc(1.0)
./python/ofdm_receiver.py: self.sigmix = gr.multiply_cc()
./python/ofdm_sync_pn.py: self.corr = gr.multiply_cc();
./python/ofdm_sync_pn.py: self.square = gr.multiply_ff()
./python/ofdm_packet_utils.py: to get to a multiple of 8.
./python/ofdm_packet_utils.py: # pad to multiple of 8
./python/ofdm_packet_utils.py: up being a multiple of 512 bytes when sent
across the USB. We
./python/ofdm_packet_utils.py: is a multiple of 128 samples.
./python/ofdm.py: @param pad_for_usrp: If true, packets are padded such
that they end up a multiple of 128 samples
./python/ofdm.py: self.scale = gr.multiply_const_cc(1.0 /
math.sqrt(self._fft_length))
./python/ofdm_sync_pnac.py: self.corr = gr.multiply_cc();
I bet many of these multiply consts could be simplified out. But, doing
little things like replacing the multiplier implementation with one
optimized with the SIMD unit makes a big difference, especially on a arm
where NEON>> FPU.
So, there is a multiply and multiply const for floats that has been
optimized in my gr-basic branch. You may want to try using those blocks
instead. http://gnuradio.org/cgit/jblum.git/log/?h=gr_basic
-Josh