lwip-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lwip-users] [TCP raw API] Nagle + tcp_output interaction (behavior in 2


From: vr roriz
Subject: [lwip-users] [TCP raw API] Nagle + tcp_output interaction (behavior in 24 throughput tests)
Date: Thu, 11 Oct 2018 15:42:06 +0200

Dear colleagues,

I'am writting my master thesis in a project using the raw API of lwip-2.0.3. Although my implementation works, I want to understand a certain behavior between the Nagle algorithm and the way I call (or not) tcp_output, but I am not quite sure what is happening.

In the case of a TCP write request, the sender function is invoked: sender(data[ ], size, send_now). It's pseudocode is:
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
sender(*data, size, send_now) {

left = size //how much data is left to be put in the queue
available = 0 // how much data I can  put in the queue
pos = 0 // current position in the data vector to be put in the queue
err = ERR_OK // holds the current err of tcp_write attempts

while( left > 0 ) {
do {
  available = tcp_sndbuf(pcb)
  
  if((left <= available) && (available > 0)) {
err = tcp_write(pcb, data[pos], left, TCP_WRITE_FLAG_COPY)
if(err == ERR_OK) {
left = 0
if(send_now) {
tcp_output(pcb)
} else { // err == ERR_MEM
blocks and waits for trigger from tcp_sent callback, indicating that sent data was acked
}
  
  } else if((left > available) && (available > 0)){ //left > available
err = tcp_write(pcb, data[pos], available, TCP_WRITE_FLAG_MORE | TCP_WRITE_FLAG_COPY)
if(err == ERR_OK) {
left = left - available
pos = pos + available
} else { // err == ERR_MEM
blocks and waits for trigger from tcp_sent callback, indicating that sent data was acked
}
  } else {//available == 0
blocks and waits for trigger from tcp_sent callback, indicating that sent data was acked
  }
  
} while (err == ERR_MEM)

}
}
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Initially I didn't have a send_now option, so I was always calling tcp_output when there was available space to send all the rest of the data I wanted. This option was requested because the team writting application needs that two specific TCP segments are sent together, even when they call sender two consecutive times. Therefore, I have to give the application the possibility to control when tcp_output is called, because  "if the window size >= MSS and available data is >= MSS" [1], or if there is NOT unconfirmed data still in the pipe, the tcp_segment will be send now even with Nagle algo enabled. The application case is the second, they are trying to send data that is smaller than MSS but there is not unconfirmed data in the pipe. 

Then, I added the send_now control option, letting tcp_output (with send_now = 0) to be called by lwip itself. I've searched all the references for tcp_output in the lwip code. For what I understood, not considering retransmissions connect/close and etc, tcp_output will be called from the tcp slow timer and in the end of tcp_input (with the comment /* Try to send something out. */). It makes sense, because when we received an acked lwip seems to try to flush the TCP Tx queue.

Ok, this strategy seems to be working fine for our purposes. But I would like to understand the behavior during throughput tests. In the throughput test, the Application is a client. It connects to a TCP server and sends a defined amount of data at each period of 1ms. This amount of data is defined according to the throughput setpoint I set in the test. I've run the test for the 4 scenarios s = (Nagle, send_now), with 6 different throughput setpoints. Thus, 24 tests. The server is implemented in a PC, by using python sockets. The nodes are directly connected. The network layer is IPv6, then I am configuring MSS to be 1500 (ETH MTU) - 40 (IPv6 header) - 20 (TCP header) = 1440 bytes.


------------------------------------------ Test summary  - Task period = 1 ms - MSS = 1440 (ipv6) --------------------------------------------------------------------------------------
throughput_setpoint = 1 Mbps (125 bytes / period)
test_id = 1 : s = (0,0) -- throughput below throughput_setpoint  and floating, RTT between 170 ms to 200 ms
test_id = 2 : s = (0,1) --  throughput_measured =  throughput_setpoint, stable, RTT close to 1 ms.
test_id = 3 : s = (1,0) -- throughput below  throughput_setpoint  and floating, RTT between 170 ms to 200 ms
test_id = 4 : s = (1,1) -- throughput_measured =  throughput_setpoint, stable

throughput_setpoint = 10 Mbps (1250 bytes / period) 
test_id = 5 : s = (0,0) -- throughput below throughput_setpoint  and floating, RTT between 1 ms to 200 ms (a bit better)
test_id = 6 : s = (0,1) -- throughput_measured =  throughput_setpoint, stable, RTT close to 1.
test_id = 7 : s = (1,0) -- throughput below  throughput_setpoint  and floating, RTT between 1 ms to 200 ms (a bit better)
test_id = 8 : s = (1,1) -- throughput_measured =  throughput_setpoint, stable, RTT close to 1.

throughput_setpoint = 25 Mbps (3125 bytes / period)   
test_id = 9  : s = (0,0) -- throughput below throughput_setpoint  and floating, RTT between 1 ms to 200 ms (a bit better)
test_id =10 : s = (0,1) -- throughput_measured =  throughput_setpoint, stable, RTT close to 1. 
test_id =11 : s = (1,0) -- throughput below  throughput_setpoint  and floating, RTT between 1 ms to 200 ms (a bit better)
test_id =12 : s = (1,1) -- throughput_measured =  throughput_setpoint, stable, RTT close to 1.

throughput_setpoint = 35 Mbps (4375 bytes / period)   
test_id =13 : s = (0,0) -- throughput_measured =  throughput_setpoint, stable, RTT close to 1 (so, for this amount of data the RTT decreases and, thus, throughput is achieved)
test_id =14 : s = (0,1) -- throughput_measured =  throughput_setpoint, stable, RTT close to 1. 
test_id =15 : s = (1,0) -- throughput below  throughput_setpoint  and floating, RTT between 1 ms to 200 ms (a bit better)
test_id =16 : s = (1,1) -- throughput_measured =  throughput_setpoint, stable, RTT close to 1.

throughput_setpoint = 45 Mbps (5625 bytes / period)   
test_id =17 : s = (0,0) -- throughput_measured =  throughput_setpoint, stable, RTT close to 1 (so, for this amount of data the RTT decreases and, thus, throughput is achieved)
test_id =18 : s = (0,1) -- throughput_measured =  throughput_setpoint, stable, RTT close to 1. 
test_id =19 : s = (1,0) -- throughput below  throughput_setpoint  and floating, RTT between 1 ms to 200 ms (a bit better)
test_id =20 : s = (1,1) -- throughput_measured =  throughput_setpoint, stable, RTT close to 1.

throughput_setpoint = 49 Mbps (6125 bytes / period)   
test_id =21 : s = (0,0) -- throughput_measured =  throughput_setpoint, stable, RTT close to 1 (so, for this amount of data the RTT decreases and, thus, throughput is achieved)
test_id =22 : s = (0,1) -- throughput_measured =  throughput_setpoint, stable, RTT close to 1. 
test_id =23 : s = (1,0) -- throughput_measured =  throughput_setpoint, stable, RTT close to 1 (so, for this amount of data the RTT decreases and, thus, throughput is achieved)
test_id =24 : s = (1,1) -- throughput_measured =  throughput_setpoint, stable, RTT close to 1.
 
-------------------------------------------------------------------------------------------------------------------------------------------------------  

For (0, 1) and (1, 1) cases (send_now always 1 ):
It doesn't matter if Nagle is ON or OFF, I can always achieve the throughput setpoint, by wireshark measurements, until 49 Mbps. This is the maximum value we can achieve, because the OS is message-passing based and we've limited the maximum length of a message is the OS, therefore limitting the capability of the application to enqueue data.

For (0, 0) cases (When Nagle = 0 and send_now = 0):
if I write 3125 or more bytes (tests 13, 17 and 21), each period, than the RTT decreases and the throughput is achieved. I don't understand why at this point things change. I thought it could be related to delayed acks from the server. I've changed the advertised windows size (on the server side) for smaller values and also set the TCP_NODELAY param of the socket to 1 but the overall behavior is the same. 

For (1, 0) cases (When Nagle = 1 and send_now = 0):
The behavior is similar to (0, 0), the RTT started very high and gets better with more data being sent, but the point where it finally gets back to around 1 ms is just in the test_id = 23, when we are sending 6125 bytes each period. Obviously, the throughput is affected by the RTT.

Therefore, I would like to understand what is causing the RTT to reach high values when send_now is 0 and to dramatically drop for certain amounts of data being sent. What else can be if not the delayed acks and why we observe different behaviors for test_id = 13 onwards (for Nagle = 0) and for test_id = 23 (for Nagle = 1).

--------
Atachments:
Can be download from: https://github.com/vitorroriz/lwip-tests
* lwipopts.h
* test_description_table summarizes the test configs.
* test_wireshark_tracefiles:
Wireshark trace files with names testX_Na_SNb_Ty. With X being the test ID, "Na" being N0 (Nagle off) or N1 (Naggle on), "SNb" being SN0 (send_now = 0) or SN1 (send_now = 1) and "Ty" being the Throughput setpoint = y. The Toyota device is the client in the trace files.
--------
Refs:
 [1]  https://en.wikipedia.org/wiki/Nagle%27s_algorithm


Sorry for the long email but I think this bunch of tests with behavior analysis can be quite useful for future developers since it is not so easy to find complete throughput tests available in the forums.
Thank you very much!

Kind regards,
Vitor


reply via email to

[Prev in Thread] Current Thread [Next in Thread]