Pardon me again for barging in. Keiran's analysis, particularly
regarding an unmotivated retransmit, sounded very familiar. I had a
problem like this at one of my clients. We changed two things and it
then went away.
First, we found and fixed a problem with the tcp_tmr. It was running
in the wrong task context. It must run in the tcpip thread. The usual
method for doing this is to make the initial call to sys_timeout from
within the callback function that executes when tcpip initialization is
done.
Second, we found that we weren't using the lightweight protection
option that I mentioned to you earlier.
I think it was actually the first thing that was causing the retransmit
problem, but we never found out for sure. It's really difficult to
track down resource conflicts. When the problem went away, we stopped
working on it.
Tom C. Barker wrote:
Thanks for your analysis Kieran. Forgive my assessment of
what ACKs are what: I was speaking of the multiple ACKs
the client sends back. ".65", the problem node, is in fact
the lwIP ftp server.
I have all my DEBUG statements on and find that I never get
a tcp_enqueue of the missing packet. It just skips over it.
My only priority is this issue right now so if you or anyone
has any ideas of what I can watch for I open to ideas. Meanwhile
I'm crafting a bit-patterned file to help identify where the
problem is occurring.
Tom
-----Original Message-----
From: address@hidden
[mailto:address@hidden]On Behalf
Of Kieran Mansley
Sent: Friday, March 04, 2005 1:29 AM
To: Mailing list for lwIP users
Subject: Re: [lwip-users] FTP-DATA exchange: TCP issues
On Thu, 2005-03-03 at 09:54 -0800, Tom C. Barker wrote:
Hello,
Maybe to short-circuit this issue, I am working with
0.7.2 and am in the process of moving to 1.1.0 so if
the following problem resembles a bug prior to 1.1.0,
please let me know.
In testing an ftp implementation where I will occasionally
successfully transfer a 400k file, I have come across a
consistently reproducible issue where my lwIP ftp server
seems to have dropped an ACK in that according to the
attached (truncated-packets) ethereal file, the packet on
line 249 should have ACK'd 264364, but instead ACKs 267284.
The rest of the (doomed) transaction is spent trying to
shoehorn in a few packets to the client's unacked queue.
Your description doesn't seem to match the trace that you've attached.
There is no packet there that ACKs 267284.
However, there is clearly something going wrong in that data transfer.
The problem seems to me to start with packet 245, which (i) is a
retransmission (of packet 242) when none seems necessary and (ii)
doesn't have the same payload as the earlier transmission of the same
data. Looks to me like packet 245 has got the wrong sequence number on
it, and it is in fact the payload of the next in-order packet.
Something similar happens with packet 244 and 247: 247 is a
retransmission of 244, but would not seem to be necessary, and this time
they both have the same payload.
What's more worrying is that the ".65" node then fails to retransmit the
correct data when it should: it gets many duplicate acknowledgements for
264364, which should lead it to retransmit that packet, but it refuses.
I can't explain this is in full, but hopefully that will give you some
clues about what might be wrong. You could compare the captured
payloads against the file that is being transferred to check my theory
about 245 having the wrong sequence number.
Kieran
_______________________________________________
lwip-users mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/lwip-users
_______________________________________________
lwip-users mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/lwip-users
--
Jim Gibbons
|
address@hidden
|
Gibbons and Associates, Inc.
|
TEL: (408) 984-1441
|
900 Lafayette, Suite 704, Santa Clara, CA
|
FAX: (408) 247-6395
|
|