[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
AW: memcpy is not optimal implemented
From: |
Wallner, Jens |
Subject: |
AW: memcpy is not optimal implemented |
Date: |
Fri, 16 Mar 2001 11:17:59 +0100 |
> "Ulrich Drepper" wrotes:
>
> > the currend memcpy function for i686 is not optimal implemented.
>
> That's not correct. The underlying assumptions are different. If the
> number of bytes is not a multiple of the word size we make no
> assumptions on the alignment of the pointer. This might or might not
> be correct.
I agree that the change from i586 to i686 where transfer loop is
replaced with a simple "rep; movsl" instruction was very good idee.
because in the most cases it's the fastes way.
But in the current memcpy function 1-3 bytes are moved BEFORE the
transfer loop if the block length is not a multiple of a double word
size. This caused an unaligned transfer in "rep; movsl".
The most block transfers are double word aligned and have a double
word bounded length, EXCEPT you operate with strings. They are
only aligned at the begining.
What a unaligned transfer loop caused you see in the test results
below:
(1) 1MB copied, aligned
(2) 1MB copied, unaligned
(1)MB/s (2)MB/s
PIII/500 (BX) 225.959083 178.181339
K7/500 (AMD 751) 243.643939 128.827630
K7/1200 (VIA KT133) 280.478946 185.844935
PPro/200 (Natoma) 82.704230 61.444197
K6/400 (HX) 50.549345 49.723512
PIII/666 (i820/RAMBUS) 410.751840 242.228698
If you move 1-3 bytes AFTER the transfer loop "rep; movsl" you have
always an aligned copy (without offset) of strings, independent of
it's length.
> "Ulrich Drepper" wrotes:
>
> You can collect some statistical data (very easy using
> LD_PRELOAD and RTLD_NEXT) and convince me.
I will try to find or write some practical test-programs that
operate with strings intensely and post you the results. But
I cat do this at weekend soonest.
P.S. here are my porposal for code changes again.
instead of this (sysdeps/i386/i686/memcpy.S):
shrl $1, %ecx
jnc 1f
movsb // byte transfer BEFORE
1: shrl $1, %ecx
jnc 2f
movsw // word transfer BEFORE
2: rep
movsl // main copy loop
use this:
shrl $2, %ecx
rep; movsl // main copy loop
movl 12(%esp), %ecx
andl $3, %ecx
rep; movsb // byte tranfer AFTER
(it's also shorter makes the same :-)
Jens Wallner
______________________________
sci-worx GmbH
System Solution Center Hamburg
Helmsweg 14-16
21218 Seevetal
Germany
Tel +49 (0)4105 5568-24
Fax +49 (0)4105 5568-22
Mailto:address@hidden
http://www.sci-worx.com
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- AW: memcpy is not optimal implemented,
Wallner, Jens <=