Re: [lwip-users] LWIP - TCP receive assert failed

From:

Jackie

Subject:

Date:

Fri, 16 Jan 2015 23:46:02 +0800

User-agent:

Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.3.0

Hi Sylvain,

Thanks for your reply. I've been working hard on this issue lately, and I found something interesting. Specifically I am using FTP for upper-level application protocol, based on TCP connection in LWIP. Because of convenience of test, I use PPP to connect the FTP server on a host PC. So basically it is like,

FTP client <---> TCP/IP (LWIP) <---> PPP <-----------------------> TCP/IP (Linux) <---> FTP server.

After stress test and debugging, more than 10 hours uploading data, I found the PCB got corrupt in tcp_output(). The case is that tcp_output() can be blocked by the lower-level function call in tcp_output_segment(), in which somehow the buffer of lower-layer protocol is full, so the upper-layer is pending, and at the same time, tcp timer is running, tcp_slowtmr() is also calling tcp_output(), so this tcp_output() is called before the previous call is finished, like,

tcp_output()
{
    ......
    tcp_output_segment(); // may be pending here ---> tcp_output() is called by tcp_slowtmr(), and returned;
    ......
    do something about pcb->unacked and pcb->unsent;
    ......
}

Obviously pcb->unacked and pcb->unsent can be corrupt, but pcb->snd_queuelen is unchanged, thus resulting a mismatch between the queue length and the data in the queue of unacked and unsent. Eventually the program will go into an assertion.

Since I am using a very old version of LWIP, I am not sure if there is a problem in the new one. In my opinion, tcp_output() is better to be designed as reentrant function, it can be blocked, in case the buffer form lower layer is full, it will be waiting a "write signal" to continue sending data.

What I changed as a workaround is try to re-check the pcb after tcp_output_segment(), when the local pointer useg should be pointing to the tail of unacked queue, otherwise, the unacked queue's content can be re-written.

Do you have any concern about it? Any suggestion and discussion is welcome.

Best,
Jackie

On 01/11/15 01:17, Sylvain Rochet wrote:

Hi Jackie,

On Mon, Jan 05, 2015 at 11:59:00PM +0800, Jackie wrote:

Hi all,

Recently when I am working on LWIP to do some stress test, e.g.
continuously uploading data to a server via TCP connection, the device
often crashed on an assert statement in tcp_receive(),

        if (pcb->snd_queuelen != 0) {
          LWIP_ASSERT("tcp_receive: valid queue length", pcb->unacked !=
NULL ||
                      pcb->unsent != NULL);
        }

After debugging the crash case, I found some possible cause that the pcb
structure has been corrupted by another thread during a context switch.
I singled out one likely candidate, tcp_slowtmr(). In this timer, it
calls another function tcp_pcb_purge(), in which it resets both unacked
and unsent queue to NULL but without setting queuelen to 0. In some
cases (like tcp state is FIN_WAIT_2), this timer will interrupt the
current tcp thread in a preemptive OS environment, modifying the current
pcb before hitting the assert statement afterwards.

How likely will it be if so? Has anyone encountered a similar issue? Any
suggestions?

You are not specific enough to be able to conclude, but, as usual, it 
looks like a broken port or usage which do not follow lwIP threading 
model.

Summary:

- Do *NOT* call anything in interrupt context, nothing, never, never, 
use your OS semaphore signaling to an Ethernet/serial/… RX thread
- memp_* functions are thread-safe if SYS_LIGHTWEIGHT_PROT is 
set, and again, thread safe does not mean it is interrupt safe, especially
if your hardware does nested interrupts
- Do *NOT* call any function from the RAW API outside lwIP thread
- Use Netconn or Socket API in others threads, but keep in mind you 
should not share a Netconn/Socket control block between threads, (or use 
proper locking if you really have to, of course).

Sylvain

_______________________________________________
lwip-users mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/lwip-users