bug-glibc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: memcpy is not optimal implemented


From: Wallner, Jens
Subject: AW: memcpy is not optimal implemented
Date: Fri, 16 Mar 2001 11:17:59 +0100

> "Ulrich Drepper" wrotes:
>
> > the currend memcpy function for i686 is not optimal implemented.
> 
> That's not correct.  The underlying assumptions are different.  If the
> number of bytes is not a multiple of the word size we make no
> assumptions on the alignment of the pointer.  This might or might not
> be correct.

I agree that the change from i586 to i686 where transfer loop is 
replaced with a simple "rep; movsl" instruction was very good idee. 
because in the most cases it's the fastes way.
But in the current memcpy function 1-3 bytes are moved BEFORE the
transfer loop if the block length is not a multiple of a double word 
size. This caused an unaligned transfer in "rep; movsl".
The most block transfers are double word aligned and have a double 
word bounded length, EXCEPT you operate with strings. They are
only aligned at the begining.
What a unaligned transfer loop caused you see in the test results
below:

(1) 1MB copied, aligned
(2) 1MB copied, unaligned
                       (1)MB/s    (2)MB/s
PIII/500 (BX)          225.959083 178.181339
K7/500 (AMD 751)       243.643939 128.827630
K7/1200 (VIA KT133)    280.478946 185.844935 
PPro/200 (Natoma)       82.704230  61.444197
K6/400 (HX)             50.549345  49.723512
PIII/666 (i820/RAMBUS) 410.751840 242.228698

If you move 1-3 bytes AFTER the transfer loop "rep; movsl" you have
always an aligned copy (without offset) of strings, independent of
it's length. 

> "Ulrich Drepper" wrotes:
>
> You can collect some statistical data (very easy using
> LD_PRELOAD and RTLD_NEXT) and convince me.

I will try to find or write some practical test-programs that
operate with strings intensely and post you the results. But
I cat do this at weekend soonest.

P.S. here are my porposal for code changes again.
instead of this (sysdeps/i386/i686/memcpy.S):

        shrl    $1, %ecx
        jnc     1f
        movsb                   // byte transfer BEFORE
1:      shrl    $1, %ecx
        jnc     2f
        movsw                   // word transfer BEFORE
2:      rep
        movsl                   // main copy loop

use this:

      shrl    $2, %ecx
      rep; movsl                // main copy loop
      movl    12(%esp), %ecx
      andl    $3, %ecx
      rep; movsb                // byte tranfer AFTER

(it's also shorter makes the same :-)

Jens Wallner
______________________________
sci-worx GmbH
System Solution Center Hamburg
Helmsweg 14-16
21218 Seevetal
Germany
Tel +49 (0)4105 5568-24
Fax +49 (0)4105 5568-22
Mailto:address@hidden
http://www.sci-worx.com   



reply via email to

[Prev in Thread] Current Thread [Next in Thread]