[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lwip-users] Automatic Rx DMA ring replenish

From: Bill Auerbach
Subject: Re: [lwip-users] Automatic Rx DMA ring replenish
Date: Thu, 3 Nov 2011 08:24:35 -0400



I haven’t benchmarked to be able to provide factual data, but I’ve done a lot of optimization and tweaking of lwIP to improve bandwidth and my study of pbufs and memory pools did not show the need for improvement considering all of the other things required to handle a TCP connection.  Pbuf_alloc of PBUF_POOLS doesn’t use a lot of runtime when the alloc fits in one pbuf, and memp_malloc and memp_free run only a few lines of simple C code to complete.  Pbuf_free also does very little on a single (unchained) pbuf.  You are in a position to test for the actual improvement.  I would be curious (and surprised) if the overall performance increases significantly, or even noticeably.  From my experience, there are several other areas to improve that significantly increase performance.  One of them I submitted a patch for and is already included in lwIP and others are optimizing your Ethernet port, improving inet_chksum and using zero-copy TX and RX.  For me optimizing memcpy (using assembly code and unrolled loops and indexed addresses) helped a good bit as well.




From: address@hidden [mailto:address@hidden On Behalf Of address@hidden
Sent: Wednesday, November 02, 2011 1:10 PM
To: Mailing list for lwIP users
Subject: Re: [lwip-users] Automatic Rx DMA ring replenish


On 30 okt 2011 18:13 "Simon Goldschmidt" <address@hidden> wrote:

"address@hidden" <address@hidden> wrote:

What if I make the Rx DMA buffer descriptor ring large enough to hold all POOL pbufs. At start-up all POOL pbufs are allocated and put in the Rx DMA ring.
pbuf_free() is modified so that whenever a POOL pbuf is freed it is immediately put in the Rx DMA ring.

This should improve performance, as well as simplify the ethernet driver a bit.

If it works for your hardware, good enough. The modification would probably be calling your custom free function instead of memp_free from pbuf_free.

However, I don't think that will work with many DMA enabled MACs: the ones I've worked with have the RX descriptors in internal memory, so the ring can't be made larger. And because RX packets are sometimes buffered (i.e. TCP OOS data), you will want to have many more PBUF_POOL pbufs than fit into your DMA ring (depending on its size and the expected throughput, of course).

However, I guess providing a way to change memory allocation/deallocation to use custom functions would be a good thing to support many different types of zero copy MACs without having to change the lwIP code for every hardware, so I guess it's well worth a try for your target!


I have tested this method on my hardware and it works nicely.
This is my suggestion for how it can be implemented in LwIP:

In pbuf.c, function pbuf_free(), change this:

   /* is this a pbuf from the pool? */
   if (type == PBUF_POOL) {
     memp_free(MEMP_PBUF_POOL, p);

To this:

   if (type == PBUF_POOL) {
     if( !DMA_RING_REPLENISH( p ) ) {
       memp_free(MEMP_PBUF_POOL, p);

In opt.h, add this:

#define DMA_RING_REPLENISH( p ) 0

In lwipopts.h, the feature can be enabled by a define like this:

#define DMA_RING_REPLENISH( p ) MAC_ReplenishRx( p )

The way it works is that whenever a PBUF_POOL is deallocated, it is first offered to the Ethernet driver via the function DMA_RING_REPLENISH(). If the Ethernet driver wants the pbuf, it returns true. If however the Ethernet driver does not want the pbuf at this time (DMA ring is already full), then the pbuf is is freed normally using memp_free().

By offering the pbuf to the Ethernet driver directly, the entire memp_free(), context switch, pbuf_alloc() sequence is bypassed, saving CPU cycles.

Timmy Brolin

reply via email to

[Prev in Thread] Current Thread [Next in Thread]