Re: [lwip-devel] in-place overwriting of payload via static "tcphdr" poi

From:

Bernhart Pelger

Subject:

Re: [lwip-devel] in-place overwriting of payload via static "tcphdr" pointer.

Date:

Mon, 8 Jan 2018 11:18:14 +0100

User-agent:

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0

Hi Simon,

At first I also suspected the Xilinx driver to be at fault. But after tracing the problem I think
that Xilinx can't really correct the problem without a significant performance impact.

I mean, what is the Xilinx port driver (or any other driver for that matter) supposed to do with
received packets from the EMACs DMA ring buffer?

- Copy the all packets to a different memory location, just so LWIP can patch some fields in the TCP header?
- Completely disable the memory cache for all RX DMA buffers?

Both options would have a significant performance impact.

There is no real reason for writing into received packets anyways. You could byte-flip the TCP header fields on-the-fly when needed.
Please note that LWIP's UDP stack does not modify RX packets -- so actually it *is* possible to leave rx packets unmodified.

Honestly, I do not fully understand how this is not a problem for every lwip port driver, not just Xilinx.
Many systems issue an RX interrupt and use (non-cache-coherent) DMA to transfer data from EMAC to memory.
Since memory is not unlimited, new rx packets will at some point overwrite old rx packets (ring buffer).

As response to your post:

> How would you ensure no application writes into the application part of an rx pbuf?
- either by documenting that you are not allowed to modify rx pbufs.
- or by handing out 'const' pointers, (and documenting that you shouldn't cast away the 'const').

> Remember that rx pbufs are not 'const' for lwIP.
But they should be const.

Normally an application does not need to write into the received packets. I can't think of a good reason
for that, especially not for TCP, where you normally have to assemble/copy the data stream from multiple
pbuf payloads anyways! If you need to patch rx data at all, then why not patch the assembled stream data?

Patching rx data is also not good programming practice: multiple entities (EMAC and application/lwip) would write to the same memory.
This makes code less readable and can lead to race conditions.

Actually I tried to fix this issue by making all pointers to rx packet buffers 'const' in tcp_in.c, but I resigned after realizing the amount of changes that would require.
My current solution is to just patch lwip's tcp_input function and place a cache flush for the TCP header at each exit point.
This is no nice solution either, since it adds dependence on Xilinx API to LWIP - but it's the smaller patch.

but you have to understand that lwIP is working for many many people like it is now.

That is what I don't fully understand, how it is working for those many people. My guesses:
- either their driver just copies the RX packets after DMA (performance hit).
- or they have some form of cache-coherent DMA (but this requires special DMA hardware which most systems don't have).
- or, since it's a very rare race condition, it might accidentally not happen with their application's memory access pattern - but it could happen in the future.
- or it does happen, but they don't care if they lose a packet about once every hour, since TCP recovers from that loss automatically.

So the issue you have is an issue in your driver or port, not in lwIP.
As such, you probably should file a bug report with the xilinx port, not here...

I tried that two years ago, they don't react. Even If they did, the best solution would be to just copy the packets before handing them to lwip.

Regards, Bernhart.
Am 08.01.2018 um 11:10 schrieb Bernhart Pelger:

Hi Simon,

At first I also suspected the Xilinx driver to be at fault. But after tracing the problem I think
that Xilinx can't really correct the problem without a significant performance impact.

I mean, what is the Xilinx port driver (or any other driver for that matter) supposed to do with
received packets from the EMACs DMA ring buffer?

- Copy the all packets to a different memory location, just so LWIP can patch some fields in the TCP header?
- Completely disable the memory cache for all RX DMA buffers?

Both options would have a significant performance impact.

There is no real reason for writing into received packets anyways. You could byte-flip the TCP header fields on-the-fly when needed.
Please note that LWIP's UDP stack does not modify RX packets -- so actually it *is* possible to leave rx packets unmodified.

Honestly, I do not fully understand how this is not a problem for every lwip port driver, not just Xilinx.
Many systems issue an RX interrupt and use (non-cache-coherent) DMA to transfer data from EMAC to memory.
Since memory is not unlimited, new rx packets will at some point overwrite old rx packets (ring buffer).

As response to your post:

> How would you ensure no application writes into the application part of an rx pbuf?
- either by documenting that you are not allowed to modify rx pbufs.
- or by handing out 'const' pointers, (and documenting that you shouldn't cast away the 'const').

> Remember that rx pbufs are not 'const' for lwIP.
But they should be const.

Normally an application does not need to write into the received packets. I can't think of a good reason
for that, especially not for TCP, where you normally have to assemble/copy the data stream from multiple
pbuf payloads anyways! If you need to patch rx data at all, then why not patch the assembled stream data?

Patching rx data is also not good programming practice: multiple entities (EMAC and application/lwip) would write to the same memory.
This makes code less readable and can lead to race conditions.

Actually I tried to fix this issue by making all pointers to rx packet buffers 'const' in tcp_in.c, but I resigned after realizing the amount of changes that would require.
My current solution is to just patch lwip's tcp_input function and place a cache flush for the TCP header at each exit point.
This is no nice solution either, since it adds dependence on Xilinx API to LWIP - but it's the smaller patch.

but you have to understand that lwIP is working for many many people like it is now.
That is what I don't fully understand, how it is working for those many people. My guesses:
- either their driver just copies the RX packets after DMA (performance hit).
- or they have some form of cache-coherent DMA (but this requires special DMA hardware which most systems don't have).
- or, since it's a very rare race condition, it might accidentally not happen with their application's memory access pattern - but it could happen in the future.
- or it does happen, but they don't care if they lose a packet about once every hour, since TCP recovers from that loss automatically.

So the issue you have is an issue in your driver or port, not in lwIP.
As such, you probably should file a bug report with the xilinx port, not here...

I tried that two years ago, they don't react. Even If they did, the best solution would be to just copy the packets before handing them to lwip.

Cheers,
Simon

Bernhart Pelger wrote:

I've traced some data corruption due to a race condition
with the lwip library provided by Xilinx for ZYNQ (EMACPS).

In tcp_in.c: tcp_input() the "tcphdr" is patched
with byte-flipped versions of 'src', 'dest', 'seqno',
'ackno', 'flags', ....

This write access causes data corruption via the
following race condition:
    1. The EMACPS receives some packets and writes them
        via DMA to a ring of buffers (buffer descriptor ring).
    2. On a new packet, Xilinx' Interrupt handling gets active:
      3. detects that the Interrupt is from the EMACPS (GIC handler)
      4. detects that it is a RX interrupt (emacps_recv_handler())
      5. Invalidates the cache for the packet payload.
         This also voids any outstanding writes in the Cache.
      6. prepares a pbuf with it's payload pointing rx packet memory,
         and adds it to a queue to be processed by the LWIP stack.
     7. LWIP dequeues the received pbuf and processes it.
       8. in tcp_in.c, a few fields of tcp header in
          the pbuf payload get overwritten.

The problem is that in step 8, the memory cache gets dirty
with an outstanding write transaction to DDR Memory in the TCP
header area. At some point in time, ring buffer wraps around
and the EMACPS writes a new packet to the same DDR memory location
via DMA.

Now we still have an outstanding write transaction with an old
tcp header and new data in DDR3 memory, but the RX ISR has not yet
invalidated the cache for this location. If the DDR3 cache
gets flushed during this time period (between steps 2,3,4),
the newly received packet will be corrupted with the old tcp
header with old port numbers that are already byte-flipped.
Also if the cache line is 32 bytes wide, some neighbour bytes will also
get corrupted in the IP header and the TCP payload.
The tcp_input() function will then byte-flip the (old)
port numbers again, and will later discard the packet
due to invalid port number.

The most elegant solution would be if the LWIP TCP stack would
not write to the RX packet buffer - that's not required anyways
just for byte-flipping some tcp header values.
These values could be be left as-is and be byte-flipped only
when they are actually read.

Other solutions would be to require the driver to copy
the complete packet before handing it to LWIP - that
would however impact performance.

I tested this with the LWIP library that was included in
Xilinx SDK 2014.4 which is lwip 1.4.0 -
however the latest 2.0.3 also seems to modify tcp packets.

_______________________________________________
lwip-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/lwip-devel

_______________________________________________
lwip-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/lwip-devel

-- 
--
Bernhart Pelger-Alzner
Entwicklungsingenieur Software
Tel:    +49 89 201804-261
Fax:    +49 89 201804-100
E-Mail: address@hidden

ASTYX GmbH
Lise-Meitner-Str. 2a, 85521 Ottobrunn, Germany
Postfach 1248, 85504 Ottobrunn, www.astyx.de
Reg.Nr.: München HRB 116 461, VAT: DE 186 893 572
CEO: Dipl.-Ing. G. Trummer

[Prev in Thread]

Current Thread

[Next in Thread]