|
From: | Taranowski, Thomas \(SWCOE\) |
Subject: | RE: [lwip-users] lwip and/or general tcp problems |
Date: | Fri, 23 Mar 2007 19:39:34 -0500 |
I am working on a data acquisition system using an Analog
Devices' Blackfin BF537, which has a 100Mb/s MAC and utilizes a port of
lwip. The lwip port appears to be derived from STABLE-0_6_3.
My application requires high throughput on the ethernet interface (~20Mb/s), so
I have been creating very simple applications to run on the embedded processor
with lwip to test the throughput and reliability of the setup. The
sample application on the BF537 simply creates, binds, and listens on
a socket, and then in an infinite loop accepts a single connection and then
while that connection is open sends large packets (1460 bytes) on the
connection. I have a simple LabVIEW application that receives the data,
and I have also been using the Wireshark analyzer to look at the
transfers. In this configuration, I am experiencing the following that I
would really appreciate some insight on: 1) When lwip is configured to use DHCP, it is very difficult
to maintain a high throughput. In fact, the connection very frequently
times out after transferring just a few packets. I don't see much other
traffic related to having the DHCP server on the LAN, and I use a switch to
isolate the transmitting device and the receiving PC. [TT] This could be a function of the configuration of your
DHCP server, and the length of lease that is granted during the initial dhcp
negotiation. 2) When not using DHCP, in general the connection is more
reliable. However, there appears to be a "cold start" issue,
where when the devices on the LAN (transmitter, switch, and receiving PC) are
powered on for the first time the connection has trouble establishing
itself. A few packets will transfer successfully, followed by a dropped
packet with no successful retransmissions over 30 seconds. [TT] This is pretty hard to diagnose. To my mind, it sounds
like it could be problems with the way in which the application design at
system startup. To diagnose this more closely, sniffer logs would be needed. 3) Again without DHCP, I can observe stalls in the
transmitted data stream. Normally, packets are transmitted more than once
a millisecond (up to 8 or ten per millisecond), but occasionally there are
periods of ~150ms where no data is transmitted. The receive window has
not closed, and there is not indication of dropped packets or retransmission in
the log file. [TT] It could be that the transmit window (assuming TCP) is
full. It could also be something to do with the multitude of #defines that tune
the performance/space in opt.h. Some sniffer logs may shed some light on the
issue. What window size does the remote end advertise? 4) Still without DHCP, I observe ~2s stalls. These
appear to be caused by >1 dropped packet, which results in the first dropped
packet being resent by fast retransmission, and all other packets being resent
by the retransmission timers. [TT] This sounds like half-duplex Ethernet operation to me.
Make sure you don’t have any half-duplex hubs floating around on your
network. These will cause random wait times on the order you mentioned. Can anyone confirm that any or all of these behaviors is
unexpected in a LAN environment (RTT normally <1ms)? Although
I'm new to this, it seems surprising that my little LAN with <15' CAT5
cable segments is so likely to have corrupted or lost packets. [TT] An old hub or faulty connector can cause all sorts of
issues. I’d revert back to as simple a network as possible, and proceed
from there, adding segments until some bad behavior is exhibited. Can anyone give me some guidance on what to expect regarding
lost packets? [TT] An analysis I did some time back for an avionics
platform concluded that I could expect that the phy, at a minimum, would cause one
lost/corrupt packet per 24 hour period on a 3 in. long peer to peer link. It
seems to me that a dozen a day on a small network would not be unusual. Are the recovery processes I've observed correct
behavior? Should only a single packet be resent usign fast
retransmission? Is there anything inherent in the stack that could cause
brief pauses in the data stream? Why does using DHCP apparently make it
so difficult to establish and maintain a high-throughput connection,
particularly since there doesn't seem to be any other traffic on the LAN? Apologies for the multiple questions, but I needed to start
somewhere, and I've already reached the limit of what the Analog Devices'
support engineers can help with. I can provide the log files from
Wireshark if that would be helpful, but some are very large (tens of
megabytes). I'd also be interested if anyone can suggest other resources
to further my understanding of networking and TCP/IP issues. [TT] You’d start by locating the portions of the
capture logs that show aberrant behavior. Thanks, Paul Butler |
[Prev in Thread] | Current Thread | [Next in Thread] |