[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Re: [lwip-users] lwIP Checksum routine
From: |
timmy brolin |
Subject: |
Re: Re: [lwip-users] lwIP Checksum routine |
Date: |
Tue, 15 Nov 2005 17:05:18 +0100 |
The checksum routine should really be written in assembly. By writing it in
assembly you can take advantage of the carry flag. This is not possible in C.
A very efficient assembly version will first load a big chunk of data into the
registers using a "load multiple" instruction, then add all the 16 or 32bit
registers using a "add with carry" instruction.
(then loop as many times as necessary)
Processors with 32bit "add with carry" instructions can do a very fast checksum
computation using this method, but even 16bit "add with carry" instructions
yield good results.
If you are looking for other things to optimise... Make sure routines such as
memcopy and setmem are performed using either DMA or "load/store multiple"
assembly instructions.
/Timmy Brolin
-----Original Message-----
From: "Ashutosh Srivastava" <address@hidden>
To: "Mailing list for lwIP users" <address@hidden>
Date: Tue, 15 Nov 2005 12:26:13 +0530
Subject: Re: [lwip-users] lwIP Checksum routine
E-mail signatureThanks for this optimization info. I have already started on
coding the
checksum computation in my processor assembly.
Can anyone suggest any other critical part of LWIP which gives
performance enhancement when optimized in assembly?
Thanks,
Ashutosh
----- Original Message -----
From: Jim Gibbons
To: Mailing list for lwIP users
Sent: Tuesday, November 15, 2005 4:52 AM
Subject: Re: [lwip-users] lwIP Checksum routine
We did an optimization for one port (NiosII). This is very CPU dependent.
In our particular case, we did better with 16-bit accesses owing to a slow
shifter. We did the best by handling 8 half-words in one pass of an outer
loop. This allowed us to use small constant offsets that could be encoded in
the load instructions, e.g., acc += data[0]; acc += data[1]; etc. The loop
overheads and the pointer update (data += 8) became a much smaller fraction of
the CPU time taken.
But, as I said, this stuff is very CPU dependent. Considering that, I think
that the core code is as it should be.
It's a simple thing to change for your particular CPU, so I would urge you to
do so. I would also urge you to try a couple of different things and measure
your results. We were surprised when we found that full word accesses weren't
good for us, and you may find some surprising things with your CPU.
You might also want to check your ethernet chip. Some of the newer ones can
assist you at the time of transmission.
Good luck!
Sathya Thammanur wrote:
Hi all,
The lwip_chksum() function in lwip/src/core/inet.c seems to be unoptimized.
This is doing halfword reads and additions. Wouldnt it be better to do word
accesses and hence additions? There would be some prologue and epilogue code to
checks for bringing the buffer to halfword->word boundaries. HAs anyone tried
doing the same for any of their ports? Or am I missing something out here?
Thanks,
Sathya
----------------------------------------------------------------------------
_______________________________________________
lwip-users mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/lwip-users
--
Jim Gibbons
address@hidden
Gibbons and Associates, Inc.
TEL: (408) 984-1441
900 Lafayette, Suite 704, Santa Clara, CA
FAX: (408) 247-6395
------------------------------------------------------------------------------
_______________________________________________
lwip-users mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/lwip-users