qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] copy, dd: simplify and optimize NUL bytes detec


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH] copy, dd: simplify and optimize NUL bytes detection
Date: Thu, 22 Oct 2015 16:44:04 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0


On 22/10/2015 16:37, Eric Blake wrote:
>> > +  /* Check first 16 bytes manually.  */
>> > +  for (len = 0; len < 16; len++)
>> > +    {
>> > +      if (! bufsize)
>> > +        return true;
>> > +      if (*p)
>> > +        return false;
>> > +      p++;
>> > +      bufsize--;
>> > +    }
>> > +
>> > +  /* Now we know that's zero, memcmp with self.  */
>> > +  return memcmp (buf, p, bufsize) == 0;
>> >  }
> Cool trick of using a suitably-aligned overlap-to-self check to then
> trigger platform-specific speedups without having to rewrite them by
> hand!  qemu is doing a similar check in util/cutils.c:buffer_is_zero()
> that could probably benefit from the same idea.

Nice trick indeed.  On the other hand, the first 16 bytes are enough to
rule out 99.99% (number out of thin hair) of the non-zero blocks, so
that's where you want to optimize.  Checking them an unsigned long at a
time, or fetching a few unsigned longs and ORing them together would
probably be the best of both worlds, because you then only use the FPU
in the rare case of a zero buffer.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]