|
From: | Bernhard Voelker |
Subject: | Re: [Qemu-devel] [PATCH] copy, dd: simplify and optimize NUL bytes detection |
Date: | Fri, 23 Oct 2015 12:59:18 +0200 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 |
On 10/22/2015 05:55 PM, Eric Blake wrote:
On 10/22/2015 09:47 AM, Bernhard Voelker wrote:Also I suspect the extra conditions involved in using longs for just the first 16 bytes would outweigh the benefits? I.E. the first simple loop probably breaks early, and if not has the added benefit of "priming the pumps" for the subsequent memcmp().what about spending some 16 bytes of memory and do the memcmp on the whole buffer? static unsigned char p[] = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}; return 0 == memcmp (p, buf, bufsize);Won't work over the whole bufsize for anything larger than 16 unless you do repeated memcmp()s. Or are you suggesting that the first 16-byte head validation be done against a static buffer via one memcmp(), followed by the other overlap-self memcmp() for the rest of the buffer? But I suspect that for short lengths, it is more efficient to do an unrolled loop than to make a function call (where the function call itself will probably just do an unrolled loop on the short length). You want the short case to be fast, and the real speedup comes by delegating as much of the long case as possible to the system memcmp() optimizations.
Of course, you're completely right. My example above was over-simplified and therefore plain wrong, sorry. Aiming at tools like dd(1), I played a bit with the idea of pre-known-zeroed buffer in front of the real payload data, i.e. having a buffer of 16 + 64k where the first 16 bytes are all NULs, thus being able to immediately use the overlap-self memcmp() with the payload starting at offset 16. Tests showed that you are right with your other suspicion, too: the overhead of calling memcmp() for small buffer sizes is less effective than Rusty's way. Therefore +1 for Padraig's patch. Have a nice day, Berny
[Prev in Thread] | Current Thread | [Next in Thread] |