Re: [Discuss-gnuradio] Debugging the Source of Dropped Samples

Hi Marcus,

Thanks for the detailed answer.

After a long period, I managed to work on it (in several steps) during the last couple of months.

I managed to find the bottleneck using ControlPorts. It showed a couple of blocks that were bottlenecks (based on previous block's output buffers). Interestingly, they weren't IO-bound. I haven't investigated it further, but I guess that the facts that (i) they caused dropped samples; (ii) were CPU-bound; and (iii) did not reach anywhere near 100% in htop indicate that they might have high fluctuations in CPU usage, it might be that they do use 100% CPU for a very short time (then they cause dropped samples), shorter than the sampling and averaging periods of tools like htop. Optimizing those blocks solved the issue of dropped samples.

I only gave kernelshark little attention. It didn't prove to be a simple tool to use out of the box, I don't think that the docs and level of polish are well enough for this general use case. Maybe, if someone takes it as a project and documents how to apply it to flowgraph inspection.

On Tue, Nov 7, 2017 at 1:25 PM Marcus Müller <address@hidden> wrote:

Hi Gilad,

part of this is for the future reader of this thread, so, please, bear
with me:

On 07.11.2017 10:42, Gilad Beeri (ApolloShield) wrote:
> I have a flowgraph, that when run, no CPU core is ever close to 100%
> utilization.

Indeed, dropped samples indicate a bottleneck narrower than your USRP's
sampling rate, but that bottleneck doesn't have to be CPU overutilization!
Simplest example: add a Throttle block to a flow graph that otherwise
wouldn't produce any problems with half the necessary sampling rate.

Most often, I find that IO operations actually become the the bottleneck
– be it that sending samples to the USRP (or receiving them) is actually
pretty time-intense, or that you need to interact with storage.

Depending on the tooling you choose, this fact might or might not be
hidden; time spent, for example "on behalf" of a thread in Kernel land,
searching for a contiguous piece of memory to give to that process, or
handling USB buffers or... might or might not be attributed to the process.

Another very classical problem is memory bandwidth and latency; so, as
shown by SE at this year's GRCon, chances aren't that bad that you can
optimize quite a bit if you co-locate connected blocks on the same CPU,
you get a caching advantage (or, rather, not incur a disadvantage).

That all being said, how do you proceed?

First of all, this is one of the cases where having ControlPort is very
helpful. If you have it (with Thrift and PerfCounters enabled), you can
start the CtrlPort Performance Monitor, and see which output buffers
"stay full" all the time. Block after that is probably your bottleneck.

If you don't, try running `perf top -ag` (as root might help here, you
want to also inspect kernel times, not quite sure about that, though).
You should be getting a listing of "when we sampled where the CPU(s)
were, in x % of the time, they were stuck in these functions".

I really tried, but haven't had the time to work with kernelshark. That
might really be a tool of choice here. In fact, it looks so cool that I
could imagine that we one day supersede the perf counter concept with
that; who knows.

If you do happen to look into that, I'd be very happy to get some
feedback about the process, and what the problems were. I think this is
definitely something we want to enable users to do – understand not only
the behaviour of their blocks in isolation, but how a system works.
After all, one of the major "let's dream about a GNU Radio in the
future" things we're considering is making it easy to distribute a flow
graph across computers, and for that, systemic insight pretty much is a
must.

Best regards,
Marcus

_______________________________________________
Discuss-gnuradio mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

From:	Gilad Beeri (ApolloShield)
Subject:	Re: [Discuss-gnuradio] Debugging the Source of Dropped Samples
Date:	Wed, 14 Feb 2018 08:08:09 +0000