lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lwip-users] Zero Copy Ethernet interface


From: Andrew Dennison
Subject: Re: [lwip-users] Zero Copy Ethernet interface
Date: Fri, 21 Sep 2007 09:36:01 +1000

On 9/21/07, Goldschmidt Simon <address@hidden> wrote:
> > Andrew Dennison wrote:
> > >> [re andrew's mail:]
> > >>
> > >>> input_thread_loop:
> > >>>     pbuf_alloc() 1514 bytes
> > >>>     pass pbuf to driver and block waiting for packet then DMA from
> > >> device
> > >>>     pbuf_realloc() <- trim to actual length
> > >>>     netif->input()
> > [snip]
> > > I'm interested to hear if you have ideas on how to improve
> > my implementation.
> >
> > I don't know if this is what Simon is alluding to, but if you
> > get two receives in quick succession (which is surely quite
> > likely), the way you describe your implementation would
> > result in the second having a good chance of getting dropped
> > because another pbuf wasn't ready yet. Point being, there's a
> > window where no packet buffer is available.

I'm using an an external MAC, as alluded to in my original email. It's
an ax88796b which gives me a 16kB packet buffer. This is why I can be
not as concerned with latency. I also wrote the driver from scratch to
take advantage of some extra features (circular TX buffer) and to get
reasonable DMA support.

In most cases my packet input thread will be blocked in the driver
with a pbuf available for use in the Rx interrupt. The only userspace
overhead is the pbuf_alloc and realloc. For now my DMA implementation
takes an additional interrupt to setup DMA for the next payload in the
chain, but I'll add scatter gather at some stage to reduce the
interrupt load.

I also have the option of enabling flow control in the mac to reduce
the chance of overflowing the packet buffer, but I'm not sure if this
will really help as the sending device may have flow control turned
off - this seems to be the default for some drivers on other OS.

As far as I can tell the only optimisation I could make is to manage
the input pbuf chains myself to avoid freeing unused fragments in the
pbuf chain:
1) adjust the pbuf to received length, keeping any pbufs trimed from the chain
2) pass adjusted pbuf to stack
3) alloc additional pbufs and chain with pbufs reclaimed above

There will also be some tuning of LWIP parameters required, but that
is a job for later.

> Yep, that's exactly the same problem I'm having, too. Although
> our hardware has multiple onchip-buffers, we're sometimes still
> not fast enough to receive at full wire-speed, resulting in TCP
> retransmissions :-(
>
> The only way to work around that is to start DMA'ing as fast as
> possible after receiving the packet (e.g. at interrupt level, but
> that is sometimes not possible, either).
>
> Anyway, that's the downside of not having a DMA enabled MAC!

I'll have to write a driver for an on-chip MAC at some stage for one
of our product variants, but it has scatter/gather DMA so that should
help.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]