discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] User experience with E1x0 boards


From: Philip Balister
Subject: Re: [Discuss-gnuradio] User experience with E1x0 boards
Date: Wed, 04 May 2011 09:05:57 -0400
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110428 Fedora/3.1.10-1.fc14 Thunderbird/3.1.10

We should move this to the usrp-users list since this has no gnuradio content. I've added it to the cc list.

On 05/04/2011 02:01 AM, Alexander Chemeris wrote:
Philip,

On Wed, May 4, 2011 at 01:03, Philip Balister<address@hidden>  wrote:
On 05/03/2011 11:25 AM, Alexander Chemeris wrote:

Hi Josh, Philip,

On Sat, Apr 23, 2011 at 17:05, Philip Balister<address@hidden>
  wrote:

On 04/22/2011 07:05 PM, Almohanad Fayez wrote:

I've always wondered about the design difference between the E100 and
the
work you did with Chris Anderson's board ... now I know.  BTW where do
you
have your driver code posted for the E100 and any documentation, if it
exists yet :) ? I found slides that you presented on April 13th.

Those slides are a recent as it gets. There may be video of that talk in
a
few months.

Driver code is here:

https://github.com/balister/linux-omap-philip

We're seeking to get maximum throughput from USRP E100. Our goal is to
collect some samples to RAM and then process them offline. Right now
E100 can't achieve even 4MSPS, which is not enough for us. What is
your feeling what is the limiting element? GPMC should be wide enough
to transfer much more data and RAM should be fast enough too. Is it
IRQ, or user-space processing?


First, do you have all the E100 kernel updates from here:

http://ettus-apps.sourcerepo.com/redmine/ettus/projects/usrpe1xx/wiki/Updating_E1XX_Boot_Files_and_Kernel_Modules

The MLO update is very significant as the L3 clock is running slow with the
original MLO. The driver updates help some.

No, we haven't updated. Thank you for pointing to this!

There are a few factors here:

1) The interface with the FPGA is still asynchronous. This limits the bus
cycle time we can use. We have spent some time looking at a synchronous
interface, but the GPMC controller does not provide a free running clock for
the FPGA. (The clock is only active during bus cycles, leaving no clocks
available to finish internal fpga cycles)

Interesting.
What throughput have you achieved with asynchronous GPMC? It is ok for
is if it can push>=12MSPS, i.e. if it's throughput is 24e6 words/sec
or more.


I don't remember of the top of my head. On a loopback test, I see about 2 MSPS, which means 2 MSPS go into the PFGA and 2 come back. There is a test program that lets you set a decimation and the looks for drops for testing one way transfers. 90% of my work has revolved around correctness to this point.

The Read and Write cycle times are 17 clocks at the moment (L3 Clock rate is 166 Mhz). So that is 102 nS per sample if everything else is perfect. See arch/arm/mach-omap2/board-overo.c for the gpmc config. (This setup move to u-boot at some point)

2) The transfer size is 2048 bytes. Larger sizes are possible, but they make
latency worse. Smaller sizes are better for latency, but max transfer rate
suffers.

We don't care about latency at all - we want to capture a lot of
samples to RAM (say, 40Mb of samples) and then slowly process them in
non-real-time. We're looking into ways to remove this 2048 bytes
limitation, because it may help us get higher rates. Could you please
advise us where to look for this? Is FPGA code changes needed? We see
that kernel driver has no notion of 2048 bytes buffer and can provide
any number of samples - does it reside at some higher levels?


The driver does have a concept of 2048 buffers. This could easily go to 4K buffers since the ring buffer is allocated via get_free_page. It could be bigger if you allocated contiguous pages so you had larger than 4K physical blocks of memory. The majority of the complexity of the driver is creating buffers usable by the DMA system that can be mapped into user space and dealing with cache management.


3) There is a delay getting interrupts after the fpga signals data ready via
GPIO. This is not huge, but for high rates it hurts. I'm not certain where
the delay is (gpio interupt controller, or kernel interrupt handler).

Could we transfer in bigger packets to reduce GPIO overhead?

Yes. This will reduce the percentage of time you are waiting on the interrupt handlers.


4) Be sure to tell UHD you want integer samples. I'm thinking even then UHD
has to swap IQ for historical reasons. (Josh, help?)

5) Anything you can do in the FPGA to reduce the sample rate helps. With the
E100 there is lots of free space in the FPGA for custom processing.

We thought about processing in FPGA and even wrote some code, but then
decided to do everything in software - it's easier to get enough
powerful DSP then develop and maintain FPGA code. As I mentioned we
plan to use TMS320C6A8167, and then move towards C66x.


Even though the PFGA is tightly coupled to the OMAP, anything you can do in the FPGA will help :) I know it is hard to do processing in the FPGA than a processor, but the FPGA is really good at the high rate stuff.

6) If I was smart about the DSP (I'm not), having the DSP take the data from
the FPGA and reduce the rate should help also.

Do you mean C64x DSP in Gumstix? I'm not sure I get this.


Yes. Basically, instead of having the ARM control the interface to the FPGA, have the DSP (in the OMAP) do it. Then pass processed data to the ARM.

Philip

As always, I am very interested in ideas for improving performance.

And thank you for your help!




reply via email to

[Prev in Thread] Current Thread [Next in Thread]