Re: [lwip-users] TCP fragmentation over PPP - fragments are lost in LwIP

Hi Simon, Hi Sylvain,

we've found it!

We're not using nonblocking connections as this never did work properly … in my today's debug session I've found why. In api_lib.c, line 610 this code just prevents using NETCONN_WRITE for nonblocking connections:

dontblock = netconn_is_nonblocking(conn) || (apiflags & NETCONN_DONTBLOCK);

if (dontblock && !bytes_written) {

/* This implies netconn_write() cannot be used for non-blocking send, since

it has no way to return the number of bytes written. */

return ERR_VAL;

}

To avoid a too long blocking we're using send timeouts. And here was the issue.

As Neil already correctly has found, a part of the data was not written into the TCP send buffer. This happened in do_writemore in api_msg.c in the send timeout handling:

if (conn->write_offset == 0) {

/* nothing has been written */

err = ERR_WOULDBLOCK;

conn->current_msg->msg.w.len = 0;

} else {

/* partial write */

err = ERR_OK;

conn->current_msg->msg.w.len = conn->write_offset;

conn->write_offset = 0;

}

If NETCONN_WRITE wants to write data and the remaining space in the TCP send buffer is less than the data to be written, do_writemore will write as many data as it can. As the outgoing flow is quite slow, it'll then run into the send timeout to avoid a too long blocking. We'll get the "partial write" condition above and it'll return an ERR_OK which is passed as the result of the NETCONN_WRITE, so my application routing assumes the data is all written. In fact, it's only partly written and the data which did not match into the TCP_SNDBUF is lost here (which is the loss we have faced).

Giving back an ERR_OK where data is definitely lost is clearly a bug, so as a solution we've changed the SNDTIMEO handling in do_writemore in this way:

#if LWIP_SO_SNDTIMEO

if (conn->send_timeout != 0)

{

dataptr = (u8_t*)conn->current_msg->msg.w.dataptr + conn->write_offset;

diff = conn->current_msg->msg.w.len - conn->write_offset;

if (diff > 0xffffUL) /* max_u16_t */

{

len = 0xffff;

}

else

{

len = (u16_t)diff;

}

available = tcp_sndbuf(conn->pcb.tcp);

if (available < len)

{

err = ERR_WOULDBLOCK;

goto err_mem;

}

if ((conn->send_timeout != 0) &&

((s32_t)(sys_now() - conn->current_msg->msg.w.time_started) >= conn->send_timeout)) {

write_finished = 1;

if (conn->write_offset == 0) {

/* nothing has been written */

err = ERR_WOULDBLOCK;

conn->current_msg->msg.w.len = 0;

} else {

/* partial write */

err = ERR_OK;

conn->current_msg->msg.w.len = conn->write_offset;

conn->write_offset = 0;

}

} else

#endif /* LWIP_SO_SNDTIMEO */

If the timeout is enabled and a sent_timeout is set for the connection, we check if the data to be sent will completely fit into the TCP_SNDBUF. If this is not the case, as we have no way to return the amount of unwritten bytes to the NETCONN_WRITE output, we'll not write anything but return an ERR_WOULDBLOCK back. That's a proper return for the my upper layers to know that the data was not sent. It'll kept in my output queue and my IP-Server task will periodically retry to send the data out.

With that modification we don't see any data loss anymore.

This condition should however be checked from your side as it allows a potential data loss without the application getting aware of it.

Thank you for your time and ideas in that issue!

Marco

From:	Marco Jakobs
Subject:	Re: [lwip-users] TCP fragmentation over PPP - fragments are lost in LwIP if data is queued for sending
Date:	Mon, 28 Nov 2016 12:07:45 +0100