[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] USB Issues

From: Michael Dickens
Subject: Re: [Discuss-gnuradio] USB Issues
Date: Sun, 15 Jan 2006 17:38:53 -0500

On Jan 14, 2006, at 6:10 PM, Eric Blossom wrote:
Generally speaking, reliable throughput on the USRP is dominated by
the OS's ability to deliver USB packets with small interpacket gaps.
[snip] The hardware (if properly implemented), should be
able to drive the USB at full speed. [snip]
Under Linux, [snip] We keep the endpoint queue non-empty by submitting multiple
asynchronous requests.

Agreed on all accounts (including the snipped stuff). My goal in my FUSB code was to deliver / retrieve as much data as possible with as little delay as possible, so as to keep whatever OSX internal software and hardware pipes full. Moving from sync (in LIBUSB) to async (in my FUSB) offers a substantial improvement - not a surprise there. While I'm happy with a 4x increase in throughput, another 2-3x will certainly be useful by someone eventually. Bottom line from the below discussion: I really can't think of anything else that would speed up FUSB transfers under MacOS X while using the current code-base. Thoughts? - MLD


The ::write() code requires 2 parts: (1) the actual ::write() command; and (2) a callback to deal with buffering. In (1), the code finds an available buffer (blocks if necessary until one is available), copies the incoming data into that buffer, then writes the copied data to the async USB pipe. When this particular data is written, a callback (2) is executed which checks to make sure the correct amount of data was written, then makes the buffer available for use again.

The ::read() code requires 3 parts: (1) the actual ::read() command; (2) a thread running the async USB read code; and (3) a callback to deal with buffering. In (2), the code gets an available buffer (blocks if necessary until one is available), then calls the async USB pipe to read the data; this is all done within a "while()" loop, and thus happens as quickly as the thread can execute. When this particular data is read, a callback (3) is executed which copies the actual amount of read data into an intermediate buffer, overwriting oldest data if necessary (and printing a warning). The actual ::read () command (1) simply copies data out of the intermediate buffer, blocking until any amount of data is available.

The "speed" factors are making sure that:
(1) there are enough buffers so that there is no blocking (NUM_QUEUE_ITEMS); (2) buffers are big enough to prevent blocking and overflow (MAX_BLOCK_SIZE); (3) each async calls transfer enough data to fill or clear whatever buffers MacOS X uses internally (MAX_BLOCK_SIZE);
(4) each async USB data transfer call happens often enough; and
(5) whatever code is generating the data and calling ::write() or ::read() gets enough CPU time to sustain the required data rate.

Defaults: NUM_QUEUE_ITEMS = 10; MAX_BLOCK_SIZE = 16*1024. This - always- results in exactly 41 underruns and overruns as printed by the "test_usrp_standard_rx" and "test_usrp_standard_tx" executables in usrp/host/apps/ (call these GR O/U's).

(a) Increasing (1) from 2 to 10 increases the throughput from about 24 MBps to 29 MBps. There is no increase beyond that. Still 41 of the GR O/U's. Leave this at 10 for now.

(b) For (2): At 4*1024, there are numerous read overflows (from my code) but no underflows (from my code); data rates are around 26 MBps, and # of GR O/U's is 41. At 16*1024, over/underflows (from my code), but still 41 GR O/U's; throughput is around 29 MBps. Increasing to 64*1024 or 640*1024 has no real effect on throughput or over/underflows or GR O/U's.

(c) Increasing (1) to 1024 and (2) to 1024*1024 results in 32 GR O/ U's, and throughput drops to about 28 MBps. Interestingly, all of the write underruns happen immediately (within 1 second) then the rest go without errors (for about 3.5 seconds). The read overruns always happen spread out, no matter (1) or (2). This is an absurd example since we never want to allocate that much DRAM for USB buffering.

(d) increasing thread priority in (4) and (5) doesn't make any difference.

Because the ::write() is not through an intermediate buffer while the ::read() is, but the results are identical for a given set of parameters, this leads me to believe that the delays are caused by OSX, and not (4) or (5). The primary way to decrease delays inside OSX is to remove the extra CoreFoundation-layer calls by going directly to the kernel. This removal would decrease the number of required threads and eliminate the "RunLoop" requirement (as found in LIBUSB, causing async calls to be effectively sync, and my current code too but in a separate thread so that ascnc calls are really async), which could only speed up the throughput.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]