lwip-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lwip-devel] [bug #61097] Deadlock due to inability to receive ACKs when


From: Christopher Head
Subject: [lwip-devel] [bug #61097] Deadlock due to inability to receive ACKs when all pbufs are full
Date: Tue, 31 Aug 2021 23:33:35 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0

URL:
  <https://savannah.nongnu.org/bugs/?61097>

                 Summary: Deadlock due to inability to receive ACKs when all
pbufs are full
                 Project: lwIP - A Lightweight TCP/IP stack
            Submitted by: hawk777
            Submitted on: Wed 01 Sep 2021 03:33:33 AM UTC
                Category: TCP
                Severity: 3 - Normal
              Item Group: Faulty Behaviour
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
         Planned Release: None
            lwIP version: Other

    _______________________________________________________

Details:

Consider an application that works in strict command-response mode; that is,
it receives a command, executes it, sends a response to the client, and then
goes back to receive the next command. Consider a client that sends commands,
but doesn’t read responses. Here’s what will happen; let’s say the
client sending the commands is running Linux:
1. Responses start piling up in the Linux receive buffer. The TCP window for
the lwIP→Linux direction closes, but for now data is ACKed, so it gets
removed from the lwIP send queue.
2. The window closes completely. The application keeps generating responses
until the send queue is full.
3. tcp_sndbuf starts returning zero, so the application stops reading
commands—it intends to wait until the client has consumed some of the
responses, opening up window and send queue space, and then carry on as usual,
with no commands or responses lost, using TCP as flow control.
4. Because the application has stopped reading commands, they start piling up
in the receive buffers.

Up to this point, everything works pretty much the same way on any operating
system. The problem comes when all the storage available (either lwIP heap
space for receive buffers, or dedicated receive buffers if you’re doing a
zero-copy driver, or lwIP heap space for receive pbuf structures, or whatever)
is exhausted. At this point:
5. The Linux client starts reading responses.
6. The Linux kernel sends ACKs reporting the opening of more window, which
should allow lwIP to send more response data.
7. There are no {buffers, pbufs} available in lwIP, therefore it can’t
receive the ACKs!

The obvious way to avoid this seemed to me to have enough receive buffer space
to hold a maximum-window-size of receive data. But then I realized, what if
the Linux client sends one byte per segment? The TCP window size is measured
in bytes, but by sending one byte per segment, lwIP has to expend an entire
“struct pbuf” (and, in the case of zero-copy hardware, whatever chunk
buffer size the driver uses) *per byte* of window—in other words, even with
an unusually small, say, 500-byte window, you’d need enough heap space for
500 “struct pbuf”s at minimum, just to avoid deadlock with *one single TCP
connection*.

It seems to me that lwIP needs to have some kind of escape hatch, where either
the heap or the driver or something communicates to the TCP layer that it’s
running out of space, and at that point the TCP layer acts on the control
portion of incoming packets (e.g. ACK numbers, FIN/RST flags, etc.) but
doesn’t keep stashing (and ACKing) the incoming data.

Or, maybe it should have an option to coalesce tiny pbufs in the receive queue
into a smaller number of larger pbufs, so that it’s possible to get a better
picture of the relationship between receive window and memory consumption and
set up a large enough heap?

As far as I can tell, there isn’t anything the application side can do to
mitigate this, other than discard responses if there isn’t room in the send
queue, which rather defeats the point of considering TCP to be reliable and to
have flow control—and applications written this way work just fine in full
operating systems like Linux, though I’m not sure how the Linux kernel
actually avoids this problem (I assume the answer may involve having gobs and
gobs of memory available, possibly in combination with coalescing incoming
data).

Version is 2.1.2 but that’s missing from the dropdown.




    _______________________________________________________

Reply to this item at:

  <https://savannah.nongnu.org/bugs/?61097>

_______________________________________________
  Message sent via Savannah
  https://savannah.nongnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]