lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lwip-users] Recent tcp_rexmit() changes


From: Sam Jansen
Subject: Re: [lwip-users] Recent tcp_rexmit() changes
Date: Tue, 27 Jul 2004 13:34:09 +1200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040616

K.J. Mansley wrote:
On Sun, 2004-07-25 at 20:46, Karl Jeacle wrote:


The timeout would be OK, and as expected, if it happened just once, but a
timeout is taking place at each RTT... the sender is stalled for 500ms at
a time instead of one RTT at a time. I hope I am making some sense!


Yes, I can see your problem.  You'll end up with retransmitted segments
(sent every 500ms) interleaving with new segments (when an ACK for a
retransmitted one is received), and a lot of timeouts will be necessary
to recover.  I think, although can't be sure, that this is the intended
behaviour.  Complex cases such as this are rarely used when illustrating
protocols though, so if anyone knows otherwise, or has a few minutes to
see how a Linux or BSD stack behaves in this scenario, I'd be very
interested to hear from them.

After a retransmit timeout, the sender should be in slow start. lwIP seems to miss this fundamental point. RFC 2001 states that

"Therefore, after retransmitting the dropped segment the TCP sender uses the slow start algorithm to increase the window from 1 full-sized segment to the new value of ssthresh, at which point congestion avoidance again takes over."

It is because of this lack in functionality that lwIP behaves poorly when there is enough loss that fast retransmit cannot cope and a timeout occurs.

> That is a possibility, and although I don't like dumping the whole
> unacked queue on the unsent queue just in case it's necessary, it would
> solve your problem.  My only worry is that it might result (if we're not
> careful) in a large number of segments being put on the network as a
> result of a loss, which is completely the opposite of what the sender
> should be doing.

I see why you might be concerned, the behaviour I found before the fast retransmit fix showed such happening. However this will not happen if lwIP enters slow start correctly. I have implemented this behaviour and now lwIP behaves well enough during loss.

Changes I made:
* Added a tcp_rexmit_rto function that is a clone of the old tcp_rexmit function * Made sure this function was called AFTER cwnd is set to 1 mss in tcp_slowtmr * Uncommented the old code which allowed acks to ack data in the unsent queue * Made a small modification to tcp_output which checks the sequence number of the packet just sent. This was needed because a packet sent with a fast retransmit would end up on the end of the unacked queue, even though it should be at the start of the queue.

Attached is a diff. I made it against the stable version, but it looks like it applies fine to HEAD.

I think this is the best way to proceed, and I believe it's designed to
solve almost exactly this problem.  I don't think it will involve too
much work, so may have a look later today.  Perhaps if I do get
something coded up you'd be willing to test/debug it for us?

I'm not certain SACK is all it is thought to be. Research in this area by myself as well as recent analysis points out that SACK almost never makes a difference. However, there are other strategies you can employ to improve throughput.

Consider this: FreeBSD does not implement SACK (though I have heard it is making its way into -CURRENT) and OpenBSD does. However, FreeBSD vastly outperforms OpenBSD, even in lossy situations. I've also measured Linux (2.4.20 at the time) to get no real difference in throughput with or without SACK in laboratory tests.

The most interesting research I have seen that improves throughput under loss is the (somewhat) recent Westwood congestion control algorithm that is a part of Linux since 2.4.26 (and 2.6.something). I've found its quite an improvement on its own, though with the help of SACK it's even better.

See:
http://www-ictserv.poliba.it/mascolo/tcp%20westwood/homeW.htm
and
http://www-ictserv.poliba.it/mascolo/tcp%20westwood/Tech_Rep_07_03_S.pdf
for information on Westwood.

Failing that, perusing the sources of FreeBSD shows all sorts of tricks they use to improve performance at little cost.

--
Sam Jansen                                           address@hidden
Wand Network Research Group             http://www.wand.net.nz/~stj2
Index: src/core/tcp.c
===================================================================
RCS file: /home/stj2/cvs/nsc/lwip/src/core/tcp.c,v
retrieving revision 1.1
diff -u -r1.1 tcp.c
--- src/core/tcp.c      1 Jun 2004 20:54:22 -0000       1.1
+++ src/core/tcp.c      27 Jul 2004 01:26:50 -0000
@@ -610,7 +610,6 @@
         if (pcb->state != SYN_SENT) {
           pcb->rto = ((pcb->sa >> 3) + pcb->sv) << tcp_backoff[pcb->nrtx];
         }
-        tcp_rexmit(pcb);
         /* Reduce congestion window and ssthresh. */
         eff_wnd = LWIP_MIN(pcb->cwnd, pcb->snd_wnd);
         pcb->ssthresh = eff_wnd >> 1;
@@ -620,6 +619,9 @@
         pcb->cwnd = pcb->mss;
         LWIP_DEBUGF(TCP_CWND_DEBUG, ("tcp_slowtmr: cwnd %u ssthresh %u\n",
                                 pcb->cwnd, pcb->ssthresh));
+
+        /* The following needs to be called AFTER cwnd is set to one mss - STJ 
*/
+        tcp_rexmit_rto(pcb);
       }
     }
     /* Check if this PCB has stayed too long in FIN-WAIT-2 */
Index: src/core/tcp_out.c
===================================================================
RCS file: /home/stj2/cvs/nsc/lwip/src/core/tcp_out.c,v
retrieving revision 1.2
diff -u -r1.2 tcp_out.c
--- src/core/tcp_out.c  16 Jul 2004 06:03:25 -0000      1.2
+++ src/core/tcp_out.c  27 Jul 2004 00:57:47 -0000
@@ -462,8 +462,16 @@
         pcb->unacked = seg;
         useg = seg;
       } else {
-        useg->next = seg;
-        useg = useg->next;
+        /* In the case of fast retransmit, the packet should not go to the end
+         * of the unacked queue, but rather at the start. We need to check for
+         * this case. -STJ Jul 27, 2004 */
+        if (TCP_SEQ_LT(ntohl(seg->tcphdr->seqno), ntohl(useg->tcphdr->seqno))) 
{
+          seg->next = pcb->unacked;
+          pcb->unacked = seg;
+        } else {
+          useg->next = seg;
+          useg = useg->next;
+        }
       }
     } else {
       tcp_seg_free(seg);
@@ -566,6 +574,33 @@
   ip_output(p, local_ip, remote_ip, TCP_TTL, 0, IP_PROTO_TCP);
   pbuf_free(p);
   LWIP_DEBUGF(TCP_RST_DEBUG, ("tcp_rst: seqno %lu ackno %lu.\n", seqno, 
ackno));
+}
+
+void
+tcp_rexmit_rto(struct tcp_pcb *pcb)
+{
+  struct tcp_seg *seg;
+
+  if (pcb->unacked == NULL) {
+    return;
+  }
+
+  /* Move all unacked segments to the unsent queue. */
+  for (seg = pcb->unacked; seg->next != NULL; seg = seg->next);
+  seg->next = pcb->unsent;
+  pcb->unsent = pcb->unacked;
+  pcb->unacked = NULL;
+
+  pcb->snd_nxt = ntohl(pcb->unsent->tcphdr->seqno);
+
+  ++pcb->nrtx;
+
+  /* Don't take any rtt measurements after retransmitting. */
+  pcb->rttest = 0;
+
+  /* Do the actual retransmission. */
+  tcp_output(pcb);
+
 }
 
 void
Index: src/core/tcp_in.c
===================================================================
RCS file: /home/stj2/cvs/nsc/lwip/src/core/tcp_in.c,v
retrieving revision 1.2
diff -u -r1.2 tcp_in.c
--- src/core/tcp_in.c   16 Jul 2004 06:03:25 -0000      1.2
+++ src/core/tcp_in.c   27 Jul 2004 01:31:20 -0000
@@ -817,8 +817,11 @@
        in fact have been sent once. */
     /* KJM 13th July 2004
        I don't think is is necessary as we no longer move all unacked
-       segments on the unsent queue when performing retransmit */
-    /*
+       segments on the unsent queue when performing retransmit 
+       
+       STJ 27 July 2004
+       Actually we need to again!
+       */
     while (pcb->unsent != NULL &&
            TCP_SEQ_LEQ(ntohl(pcb->unsent->tcphdr->seqno) + 
TCP_TCPLEN(pcb->unsent),
                        ackno) &&
@@ -843,7 +846,6 @@
         pcb->snd_nxt = htonl(pcb->unsent->tcphdr->seqno);
       }
     }
-    */
 
     /* End of ACK for new data processing. */
 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]