qemu-devel
[Top][All Lists]

## Re: [Qemu-devel] [RFC v5 1/2] util: add memmem replacement function

 From: Claudio Fontana Subject: Re: [Qemu-devel] [RFC v5 1/2] util: add memmem replacement function Date: Mon, 18 May 2015 10:36:40 +0200

```Hello Eric,

On 15 May 2015 at 18:01, Eric Blake <address@hidden> wrote:
> On 05/15/2015 09:13 AM, Claudio Fontana wrote:
>
>>> Or back to the original question - why are we worrying about the O(n)
>>> memmem implementation when the naive O(n^2) is MUCH shorter and easier
>>> to understand, and where the scaling penalty is only apparent on really
>>> long corner case strings?  Is our use of memmem going to hit the corner
>>> cases where scaling matters?
>>
>> For the specific use case here (memory search functionality),
>> the haystack is normally the factor which really can scale up a lot.
>
> Remember, when searching, the worst-case penalty is determined by a
> combination of the size of the haystack (H) and the size of the needle
> (I'll choose M to represent that, to avoid confusion with the generic n
> in Big-O notation).  The most brain-dead loop (for each byte in
> haystack, see if the needle matches) is H*M operations; this is
> noticeable as quadratic scaling (and even then, only for certain
> matching patterns, such as determining if a needle of all 'a' except for
> a final 'b' occurs in a haystack of all 'a') only if both needle and
> haystack can grow arbitrarily large: O(H*M) == O(n^2).  Where my code
> shines is that it breaks the operation into two parts - a factorization
> of the needle O(M) == O(n), then a linear search using that factored
> needle O(H) == O(n), for an overall cost O(H+M) == O(n). But with the
> naive algorithm, if the needle is capped to a fixed size, then you have
> O(constant*H) == O(H) == O(n).
>
> Meanwhile, remember that Big-O notation is only about scaling effects,
> but that there are still constant factors to be considered - an O(n)
> algorithm that requires 3 comparisons per byte of the haystack will
> always be slower than an O(n) algorithm that only needs 2 comparisons
> per byte of the haystack, even though both algorithms scale linearly.
>
>>
>> I would expect the needle to be usually smallish (though for string search
>> sometimes the needle could probably arrive at the threshold of 32).
>>
>> When scanning a very large memory range, considering a binary string with
>> roughly length around 32, do you think that the non-trivial implementation
>> could bring some noticeable benefits?
>>
>> If it's below 32, do we get any benefits compared to the trivial
>> implementation?
>
> In fact, one of the glibc optimizations was realizing that my two-way
> algorithm code is LOUSY at short needles.  It has a high relative
> overhead for finding the proper factorization of a short needle, before
> it can even start searching.  A hybrid approach (brute force a short
> needle, and only bother with factorizing a long needle, or even after a
> heuristic of a minimum number of comparisons is first met) is ideal,
> because of the observation above that the poor scaling of quadratic
> behavior is only a problem for large needles, while the constant
> overhead of factorization is measurably larger in proportion to the rest
> of the work done on short needles.
>
> If you are letting the user search for arbitrarily long needles, then an
> O(n) scale may be worth the speed penalty on shorter needles, or the
> complexity of a hybrid approach (again, I think glibc may have better
> hybrid approach than gnulib at the moment).  But as this is an HMP
> interface, the user has to type the search string, and is unlikely to be
> typing long needles, so it is justifiable to just ditch the complexity
> of a hybrid approach and use ONLY the naive O(n^2) algorithm, because as
> I argued above, it is still observable only as O(n) for a bounded-size
> needle.
>
>>
>>>
>>> [Personal note: I'm still tickled pink every time I see yet another
>>> project adopt my O(n) code - I didn't come up with the algorithm, but
>>> merely scraped it from public research, and was shocked to note how many
>>> libc still had O(n^2) code at the time.  But it is still gratifying to
>>> see that my implementation is getting such wide adoption]
>>>
>>
>> Well your code and its derived versions have been tried and tested a lot, so
>> picking it up just makes sense to me, to avoid getting into the same corner
>> am a lazy person of course!
>
> You do have a point here - if we are going to reuse code, then reusing a
> debugged implementation rather than writing from scratch is the way to
> go.  But my point remains that the complexity of a reused Two-Way
> algorithm may not be necessary, and the 3-line naive algorithm is
> trivial to review.
>
> What's more, it is also arguable that we only use
> the naive algorithm on less-than-stellar platforms that lack memmem
> natively (well, we may also use less-than-stellar performance if the
> native memmem is not O(n)) - but who is going to be using the HMP
> command on such a platform?  And maybe those platforms will finally be
> convinced to add a decent memmem if they realize that we are slow on
> their platform but not on other platforms, precisely because we are
> working around their lack of memmem by intentionally using a slow but
> trivial replacement.

Good points, I will respin with a trivial implementation shortly.

Thanks!

Claudio

```