qemu-devel
[Top][All Lists]

Re: [Qemu-devel] [RFC v5 1/2] util: add memmem replacement function

 From: Eric Blake Subject: Re: [Qemu-devel] [RFC v5 1/2] util: add memmem replacement function Date: Fri, 15 May 2015 10:01:39 -0600 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0

```On 05/15/2015 09:13 AM, Claudio Fontana wrote:

>> Or back to the original question - why are we worrying about the O(n)
>> memmem implementation when the naive O(n^2) is MUCH shorter and easier
>> to understand, and where the scaling penalty is only apparent on really
>> long corner case strings?  Is our use of memmem going to hit the corner
>> cases where scaling matters?
>
> For the specific use case here (memory search functionality),
> the haystack is normally the factor which really can scale up a lot.

Remember, when searching, the worst-case penalty is determined by a
combination of the size of the haystack (H) and the size of the needle
(I'll choose M to represent that, to avoid confusion with the generic n
in Big-O notation).  The most brain-dead loop (for each byte in
haystack, see if the needle matches) is H*M operations; this is
noticeable as quadratic scaling (and even then, only for certain
matching patterns, such as determining if a needle of all 'a' except for
a final 'b' occurs in a haystack of all 'a') only if both needle and
haystack can grow arbitrarily large: O(H*M) == O(n^2).  Where my code
shines is that it breaks the operation into two parts - a factorization
of the needle O(M) == O(n), then a linear search using that factored
needle O(H) == O(n), for an overall cost O(H+M) == O(n). But with the
naive algorithm, if the needle is capped to a fixed size, then you have
O(constant*H) == O(H) == O(n).

Meanwhile, remember that Big-O notation is only about scaling effects,
but that there are still constant factors to be considered - an O(n)
algorithm that requires 3 comparisons per byte of the haystack will
always be slower than an O(n) algorithm that only needs 2 comparisons
per byte of the haystack, even though both algorithms scale linearly.

>
> I would expect the needle to be usually smallish (though for string search
> sometimes the needle could probably arrive at the threshold of 32).
>
> When scanning a very large memory range, considering a binary string with
> roughly length around 32, do you think that the non-trivial implementation
> could bring some noticeable benefits?
>
> If it's below 32, do we get any benefits compared to the trivial
> implementation?

In fact, one of the glibc optimizations was realizing that my two-way
algorithm code is LOUSY at short needles.  It has a high relative
overhead for finding the proper factorization of a short needle, before
it can even start searching.  A hybrid approach (brute force a short
needle, and only bother with factorizing a long needle, or even after a
heuristic of a minimum number of comparisons is first met) is ideal,
because of the observation above that the poor scaling of quadratic
behavior is only a problem for large needles, while the constant
overhead of factorization is measurably larger in proportion to the rest
of the work done on short needles.

If you are letting the user search for arbitrarily long needles, then an
O(n) scale may be worth the speed penalty on shorter needles, or the
complexity of a hybrid approach (again, I think glibc may have better
hybrid approach than gnulib at the moment).  But as this is an HMP
interface, the user has to type the search string, and is unlikely to be
typing long needles, so it is justifiable to just ditch the complexity
of a hybrid approach and use ONLY the naive O(n^2) algorithm, because as
I argued above, it is still observable only as O(n) for a bounded-size
needle.

>
>>
>> [Personal note: I'm still tickled pink every time I see yet another
>> project adopt my O(n) code - I didn't come up with the algorithm, but
>> merely scraped it from public research, and was shocked to note how many
>> libc still had O(n^2) code at the time.  But it is still gratifying to
>> see that my implementation is getting such wide adoption]
>>
>
> Well your code and its derived versions have been tried and tested a lot, so
> picking it up just makes sense to me, to avoid getting into the same corner
> a lazy person of course!

You do have a point here - if we are going to reuse code, then reusing a
debugged implementation rather than writing from scratch is the way to
go.  But my point remains that the complexity of a reused Two-Way
algorithm may not be necessary, and the 3-line naive algorithm is
trivial to review.  What's more, it is also arguable that we only use
the naive algorithm on less-than-stellar platforms that lack memmem
natively (well, we may also use less-than-stellar performance if the
native memmem is not O(n)) - but who is going to be using the HMP
command on such a platform?  And maybe those platforms will finally be
convinced to add a decent memmem if they realize that we are slow on
their platform but not on other platforms, precisely because we are
working around their lack of memmem by intentionally using a slow but
trivial replacement.

--
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

```

signature.asc
Description: OpenPGP digital signature