[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re: [lwip-users] lwIP Checksum routine

From: timmy brolin
Subject: Re: Re: [lwip-users] lwIP Checksum routine
Date: Tue, 15 Nov 2005 17:05:18 +0100

The checksum routine should really be written in assembly. By writing it in 
assembly you can take advantage of the carry flag. This is not possible in C.

A very efficient assembly version will first load a big chunk of data into the 
registers using a "load multiple" instruction, then add all the 16 or 32bit 
registers using a "add with carry" instruction.
(then loop as many times as necessary)

Processors with 32bit "add with carry" instructions can do a very fast checksum 
computation using this method, but even 16bit "add with carry" instructions 
yield good results.

If you are looking for other things to optimise... Make sure routines such as 
memcopy and setmem are performed using either DMA or "load/store multiple" 
assembly instructions.

/Timmy Brolin

-----Original Message-----
From: "Ashutosh Srivastava" <address@hidden>
To: "Mailing list for lwIP users" <address@hidden>
Date: Tue, 15 Nov 2005 12:26:13 +0530
Subject: Re: [lwip-users] lwIP Checksum routine

E-mail signatureThanks for this optimization info. I have already started on 
coding the
checksum computation in my processor assembly. 

Can anyone suggest any other critical part of LWIP which gives 
performance enhancement when optimized in assembly?

  ----- Original Message ----- 
  From: Jim Gibbons 
  To: Mailing list for lwIP users 
  Sent: Tuesday, November 15, 2005 4:52 AM
  Subject: Re: [lwip-users] lwIP Checksum routine

  We did an optimization for one port (NiosII).  This is very CPU dependent.  
In our particular case, we did better with 16-bit accesses owing to a slow 
shifter.  We did the best by handling 8 half-words in one pass of an outer 
loop.  This allowed us to use small constant offsets that could be encoded in 
the load instructions, e.g., acc += data[0]; acc += data[1]; etc.  The loop 
overheads and the pointer update (data += 8) became a much smaller fraction of 
the CPU time taken.

  But, as I said, this stuff is very CPU dependent.  Considering that, I think 
that the core code is as it should be.  

  It's a simple thing to change for your particular CPU, so I would urge you to 
do so.  I would also urge you to try a couple of different things and measure 
your results.  We were surprised when we found that full word accesses weren't 
good for us, and you may find some surprising things with your CPU.

  You might also want to check your ethernet chip.  Some of the newer ones can 
assist you at the time of transmission.

  Good luck!

  Sathya Thammanur wrote: 
    Hi all,
    The lwip_chksum() function in lwip/src/core/inet.c seems to be unoptimized. 
This is doing halfword reads and additions. Wouldnt it be better to do word 
accesses and hence additions? There would be some prologue and epilogue code to 
checks for bringing the buffer to halfword->word boundaries. HAs anyone tried 
doing the same for any of their ports? Or am I missing something out here?


lwip-users mailing list

        Jim Gibbons

        Gibbons and Associates, Inc.
       TEL: (408) 984-1441

        900 Lafayette, Suite 704, Santa Clara, CA
       FAX: (408) 247-6395


  lwip-users mailing list

reply via email to

[Prev in Thread] Current Thread [Next in Thread]