[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] How to utilize multi-thread processor

From: Qing Yang
Subject: Re: [Discuss-gnuradio] How to utilize multi-thread processor
Date: Sun, 2 Sep 2012 17:22:38 +0800

Hi Tom,

We are profiling our codes on Xeon w3530(8 cores)+12GB memory+N210, and find some interesting issues.

1. The receiver works well at 1MHz sample rate, we see each core is 10%~20% occupied using system monitor. Once we set sample rate larger than 1M (say 2M), the program blocks(no decoding output) and we see only one core is 100% occupied while others are idle. Using Kcachegrind, we see 86% cpu time is cost by function "raw_peak_detector_fb::work(...)". This function is used by the first module (synchronization) of RawOFDM, I think this is the module that choke the system. My first step is to dig into this module and try to make it faster.

2. In the ordinary case (1MHz) both the transmitter and receiver call the function "gr_multiply_cc::work()" frequently, and its cost is quite high (nearly 18% of the program). I think there are methods to boost this function, right? Perhaps the VOLK lib will help, I will try it out.

Yang, Qing
Information Engineering, CUHK

2012/8/28 Tom Rondeau <address@hidden>
On Mon, Aug 27, 2012 at 7:07 AM, Qing Yang <address@hidden> wrote:
> Hi there,
> I am currently doing a OFDM transceiver project based on RawOFDM. We want to
> implement 20MHz bandwidth transmit/receive, but the RawOFDM code seems to
> support only narrow band (<1MHz). Once I set the sample-rate larger than
> 1MHz, my program will block with overrun messages (more details here
> http://lists.gnu.org/archive/html/discuss-gnuradio/2012-08/msg00069.html). I
> think the reason is that at 20MHz sample-rate, USRP produces too much data
> for the PC to process and drain PC's computation power.
> To boost the speed, I have two questions
> 1) My cpu have 8 threads(4 cores), can I manually dedicate one thread to
> each gr block, and make it a pipe-line system? Tom mentioned that gnuradio
> use a "thread-per-block" scheduler
> (http://lists.gnu.org/archive/html/discuss-gnuradio/2010-09/msg00274.html)
> but in my case only two threads are 100% occupied when I run the program.
> 2) Inside some blocks, we extensively use vector multiplications (e.g.,
> precoding, CFO compensation). I've heard about the use of SSE to boost the
> speed of vector multiplication. How can I utilize this technology in my
> program?
> Best regards,
> --
> Yang, Qing
> Information Engineering, CUHK


Yes, the default scheduler is the thread-per-block, so each block
operates in its own thread, and the OS will distribute those across
the CPU's. What you are seeing is probably that two blocks in
particular are taking a long time to process and starving the others.
So CPU affinity won't help you. From your other posts, it looks like
you are trying to profile the code. That's the better way to go;
figure out which blocks are taking the most time and try to optimize


reply via email to

[Prev in Thread] Current Thread [Next in Thread]