Re: Maximum Number of Bins

discuss-gnuradio

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Maximum Number of Bins

From:	Marcus D. Leech
Subject:	Re: Maximum Number of Bins
Date:	Mon, 02 Nov 2020 12:48:22 -0500
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0

On 11/02/2020 12:39 PM, Criss Swaim wrote:

Thank you Marcus & Marcus - your insights are greatly appreciated.

I am looking at the suggestions, exp the fft conversion and we are
considering upgrading, but need to see if the system will scale, as is.
BTW, I am maintaining the code and not the original developer, so I am
not familiar with all the pieces, esp. the ones that have been working.
I look at this as an opportunity to dig deeper into GnuRadio.

1) for clarification, we have been running the 3.7 code for 4 -5 years -
not sure when we upgraded to 3.7.9.  The system runs with 2 million fft
bins, but at 3 or 4 million, it fails.  M. Leach has demonstrated that
without our custom block, GnuRadio can process the high bin levels.  I
have run various configurations of our model (without the bin_sub_avg
python block) but I still receive the error.

2) Both of you have mentioned we are using old message queues...Can you
point me to some documentation that explains this. We are using the
blocks.add_const_vff and connect functions to remove background constant
(a numpy array) from the signal stream.  What would be a better
approach? I have not looked at this for several years, so I need to
refresh and this would be a good time to look at alternate options.  It
is bit of a black box for me and I would like to research alternate
approaches as I dig into this process.

using add_const is fine as a way to remove backgrounds.


3) M. Leach: you indicated that the conversions from a
stream->string->numpy array is very inefficient.  Can you point to
another approach to convert a stream to numpy array?  This is done once
every 60 minutes, but still if it could be improved, that would help.

A gnu radio sample stream is already numpy compatible, so turning it into
  a string first (maybe that's what is going into the message queue?) isn't
  necessary.


4) Finally, I have also been looking for a change log for the 3.7 to 3.8
system. Moving from 3.6 to 3.7 was a significant change and was
wondering if 3.7 to 3.8 is the same level of effort for custom blocks.
Also, is there a timeline for 3.9?

I have one application that straddles between 3.7 and 3.9 -- there weresome gotchas, andI'm not going to recommend anyone convert to 3.9 yet. The 3.7--3.8conversion should be

  quite a bit smoother than 3.6 to 3.7


Again, thanks for any guidance.

Criss Swaim
cswaim@tpginc.net
cell: 505.301.5701

On 10/31/2020 9:55 AM, Marcus Müller wrote:

Hi Craig, hi Marcus,

Also, just because I need to point that out:

GNU Radio 3.7 is really a legacy series of releases by now. You should
avoid using it for new developments - it's getting harder and harder to
even build it on modern versions of Linux. In fact, a lot of its
dependencies simply don't exist for modern systems anymore.
Developing for 3.7 is hence dangerous in terms of lifetime. That's among
the chief reasons why we released 3.8. Took us long enough!

3.7.9.2 is positively ancient. A 3.7.13.4 or later should be the oldest
version of GNU Radio you work with, even when maintaining old code.

Other than that:

Oct 29 10:45:07 tf kernel: analysis_sink_1[369]: segfault at
7f9c5a7fd000 ip 00007f9dd9361d43 sp 00007f9c5a48a638 error 6 in
libgnuradio-vandevender.so[7f9dd9336000+4d000]

This really looks like a bug in your code!
These happen easily with the older style msgq that you seem to be using
(we've basically all but removed these in current development versions
of GNU Radio), especially if directly interfacing with Python land,
which has different ideas of object lifetime than your C++ code might
have...
I think a slight reconsideration of your software architecture might
help here, but I've not seen your overall code.

With an FFT size of 2**22 bins.  This took about 20 seconds for the
FFTW3 "planner" to crunch on, but after that, worked
   just fine within the flow-graph.

Not quite 20s for me, but yes, single-threaded FFT performance was about
14 transforms of that size per second, 2 threads allowed for ~23
transforms a second, 4 threads for about 28. Knowing GNU Radio, I'd
recommend you rather stick with a single thread per transform, because
other block also have CPU requirements (if you really want to increase
throughput, deinterleave vectors and have multiple single-threaded FFTs
run in parallel, then recombine after).

Seeing that you you only need 20 MS/s, and 14 transform are 14 · 2²² =
7·2²³ samples a second and that would be roughly 56 MS/s, I think you
are fine. If you're not, get a faster PC, honestly!

Best regards,
Marcus M (the younger Marcus)

On 29.10.20 23:53, Marcus D. Leech wrote:

On 10/29/2020 06:03 PM, Criss Swaim wrote:

we are running version 3.7.9.2

I constructed a simple flow-graph in GR 3.7.13.5

osmosdr_source--->stream-to-vector-->fft-->null-sink

With an FFT size of 2**22 bins.  This took about 20 seconds for the
FFTW3 "planner" to crunch on, but after that, worked
   just fine within the flow-graph.

You should really keep your FFT sizes to a power-of-2, particularly at
this size range.  That's not related to your problem
   directly, but power-of-2 FFTs have lower computational complexity.
Among other things, the FFTW2 "planner" for
   non-power-of-2 FFTs at these eye-watering FFT sizes seems to take a
LONG time to compute a "plan".

You should probably look at restructuring your code--looks like you're
using message queues and marshaling your samples
   as *string* data through those queues.  While it shouldn't necessarily
Seg Fault, it's not a terribly efficient way of doing things.

Criss Swaim
cswaim@tpginc.net
cell: 505.301.5701
On 10/29/2020 11:37 AM, Marcus D. Leech wrote:

On 10/29/2020 01:17 PM, Criss Swaim wrote:

I have attached a png of the flow graph and the error msgs from the
system log are below.  These error messages are the only messages.

Oct 29 10:45:26 tf abrt-hook-ccpp[378]: /var/spool/abrt is
23611049718 bytes (more than 1279MiB), deleting
'ccpp-2020-10-27-15:30:43-28474'
Oct 29 10:45:07 tf abrt-hook-ccpp[378]: Process 329 (python2.7) of
user 1000 killed by SIGSEGV - dumping core
Oct 29 10:45:07 tf audit[370]: ANOM_ABEND auid=1000 uid=1000
gid=1000 ses=8656
subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=370
comm="copy11" exe="/usr/bin/Oct 29 10:45:07 tf audit[369]:
ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=8656
subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=369
comm="analysis_sink_1" exe="Oct 29 10:45:07 tf kernel: traps:
copy11[370] general protection ip:7f9e0acfdee0 sp:7f9c5a7fb590
error:0 in libpthread-2.22.so[7f9e0acf1000+18000]
Oct 29 10:45:07 tf kernel: analysis_sink_1[369]: segfault at
7f9c5a7fd000 ip 00007f9dd9361d43 sp 00007f9c5a48a638 error 6 in
libgnuradio-vandevender.so[7f9dd9336000+4d000]

Flow is USRP -> stream to vector -> fft -> complex to mag ->
bin_sub_avg -> analysis_sinkf

bin_sub_avg (python) & analysis_sinkf (c/c++) are custom blocks.

the function of Bin Sub Avg, which is written in Python, is to start
a background task which periodically (in this case hourly) samples
the input signal, calculates the background noise and subtracts it
from the signal that is passed the the Analysis_sinkf module.

Analys_sinkf monitors each bin and only when specific thresholds for
the bin are met (ie duration, strength) is the signal written out to
a signal file.  Signals not passing the criteria are dropped.

This code base has been running for over 3 years, with the original
system implementation about 8/9 years ago.

I have traced the problem to the input signal into bin_sub_avg when
the number of fft bins is 3 million (2 million works).  At 3 million
bins, any reference to the result of the delete_head() function in
the python code causes a failure.  The python code just fails
without a traceback, then the invalid data stream is passed to the
analysis_sinkf module which is C/C++ and it causes the segment fault.

Thus my suspicion is there is a limit in the fft block on the number
of bins it can handle and some variable is overflowing, but this is
a guess at this point.  There may be a restriction in the
gr.signature_io module, but that seems unlikely.

What version of Gnu Radio is this?

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Maximum Number of Bins, Criss Swaim, 2020/11/02
- Re: Maximum Number of Bins, Marcus D. Leech <=

Prev by Date: Re: Maximum Number of Bins
Next by Date: explaining i/q
Previous by thread: Re: Maximum Number of Bins
Next by thread: explaining i/q
Index(es):
- Date
- Thread