qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Patch] Clear memory using memset instead of handcoded


From: Daniel Egger
Subject: Re: [Qemu-devel] [Patch] Clear memory using memset instead of handcoded loop
Date: Mon, 4 Oct 2004 02:15:53 +0200

On 03.10.2004, at 20:17, Karl Magdsick wrote:

Is the section still a hot spot in your tests?

A short test after the change revealed that this indeed freed a
few cpu cycles which have been spent elsewhere. But since I've now
a bzero call in the profiles this function because another hotspot.

So it is an improvement but the most optimal would be to reduce the
amount of calls to memset or the containing function.

Maybe a macro or
inline function would be more appropriate.  The macro/inline function
could be defined to use memset for now, and later changed to use
optimized inline assembly language on architectures that don't inline
memset.  It is also likely possible to write a slightly faster inline
assembly routine since we know that we want to always set the memory
to zero, while memset has to allow for an arbitrary fill value.

It's very unlikely that a compiler will not optimize memset,
especially when the parameters are constant, since this is in the
C standard since C89 and used everywhere. I very much believe that
it is close to impossible to produce a better handcoded version
neither in assembly nor in C: The provided memset is usually the
optimum for the platform even when used non-inline, so duplicating
it does not make sense. Handcoding it in ASM will be always
suboptimal since the compiler will always schedule it as is
without taking the surrounding code into account which is typically
a huge loss compared to a C version.

Since memset is a builtin in GCC the compiler can (and will!) use
the additional information provided by an explicit call of the
builtin for further optimisation where possible. Actually the
current gcc (3.4) is still pretty stupid about this particular
optimisation but since this is low-hanging (compiler) fruit the
new tree optimizers in gcc 4.x will optimize the heck out of that.

Servus,
      Daniel

Attachment: PGP.sig
Description: This is a digitally signed message part


reply via email to

[Prev in Thread] Current Thread [Next in Thread]