[Discuss-gnuradio] Diagnosing why a flowgraph occasionally stops process

Hi everybody,

I have a very complex flowgraph that sometimes simply stops processing receive samples. It doesn't crash, the transmit side of the flowgraph is still fully operational. There are no exceptions being thrown and nothing indicating that any threads are dying.

This makes use of a lot of out-of-tree custom modules. I've seen similar things in situations where I've screwed up in my OOT modules--for example, if a sync block uses set_min_noutput_items with a large value and the upstream block can only produce a small number of samples, it seems possible to stall the flowgraph.

The source is definitely not the cause of the stoppage--this happens even in simulation where the input is just a noise source. It may take hours of running for the flowgraph to stall. Or minutes. It seems very random.

My question is: are there any tools available to me to help determine what's causing the stall? I've tried using GDB, which is difficult with ~100 threads and it just seems that most things are at semaphore waits. I don't know that I can deduce anything else from GDB. I really do think it's ultimately a logical issue like what I previously described where I'm mistreating the scheduler and giving it a situation it cannot cope with.

Basically, can I poke the scheduler and say "tell me what's going on"? I'd love to get some data on the last several rounds of forecast() calls and details on the work function calls, e.g., "Block such-and-such was provided X inputs and space for Y outputs, it consumed A and produced B"

Thanks!

Joe

From:	Joe K
Subject:	[Discuss-gnuradio] Diagnosing why a flowgraph occasionally stops processing samples
Date:	Tue, 19 Feb 2019 10:53:20 -0800