[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC v5 1/2] util: add memmem replacement function

From: Claudio Fontana
Subject: Re: [Qemu-devel] [RFC v5 1/2] util: add memmem replacement function
Date: Fri, 15 May 2015 17:13:42 +0200
User-agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Thunderbird/24.0.1

Hello Eric,

On 15.05.2015 16:56, Eric Blake wrote:
> On 05/15/2015 08:12 AM, Paolo Bonzini wrote:
>> On 15/05/2015 15:57, Claudio Fontana wrote:
>>> The header here mentions GPLv2+, but the module data for memmem in gnulib 
>>> mentions LGPLv2+.
>>> Very confusing.
>>> The COPYING file in gnulib mentions:
>>> "The files in here are mostly copyright (C) Free Software Foundation, and
>>> are under assorted licenses.  Mostly, but not entirely, GPL.
>>> Many modules are provided dual-license, either GPL or LGPL at your
>>> option.  The headers of files in the lib directory (e.g., lib/error.c)
>>> state GPL for convenience, since the bulk of current gnulib users are
>>> GPL'd programs.  But the files in the modules directory (e.g.,
>>> modules/error) state the true license of each file, [...]
>>> "
>>> So if one would want to include the gnulib code without using the gnulib 
>>> esoteric modules mechanisms, what should one do?
>> Change it manually, I guess.
> It's still possible to use gnulib-tool for a one-time export of the
> modules in question into a temporary project with correct licensing
> headers written by gnulib-tool, then copy from that temporary project
> into qemu.  It's probably just as fast to do it by hand, but using
> gnulib-tool in the procedure (and documenting in the commit message what
> that procedure was, to make it easier for the next guy to copy) is
> probably friendlier.  I'm not sure off-hand what the command line would
> be, but could help if that's the approach we want to take.
> Or, since gnulib's memmem is (mostly) sync'd with glibc's memmem (or at
> least was in sync at one point), and glibc is under LGPLv2+, why not
> copy straight from glibc instead of from gnulib, and then you don't have
> to modify any headers?  (I haven't looked lately, but glibc may have
> optimizations that should be backported into gnulib)

I will check the status of glibc's version, seems a good idea to get it from 

> Historical note: gnulib had this particular implementation of memmem
> first, as written by me under a dual license; then I got glibc to
> incorporate it under just LGPLv2+; then since the initial
> implementation, glibc has been better optimized by others. If you really
> want to, you could observe that since I released the same original
> memmem implementation under multiple licenses at the time I first wrote
> it, you could therefore find and copy an older version under an even
> looser license in the newlib project (but without glibc's enhancements,
> as I can't dual-license someone else's follow-up work):
> https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=newlib/libc/string/memmem.c;h=25704e46;hb=HEAD
> Or back to the original question - why are we worrying about the O(n)
> memmem implementation when the naive O(n^2) is MUCH shorter and easier
> to understand, and where the scaling penalty is only apparent on really
> long corner case strings?  Is our use of memmem going to hit the corner
> cases where scaling matters?

For the specific use case here (memory search functionality),
the haystack is normally the factor which really can scale up a lot.

I would expect the needle to be usually smallish (though for string search 
sometimes the needle could probably arrive at the threshold of 32).

When scanning a very large memory range, considering a binary string with 
roughly length around 32, do you think that the non-trivial implementation 
could bring some noticeable benefits?

If it's below 32, do we get any benefits compared to the trivial implementation?

> [Personal note: I'm still tickled pink every time I see yet another
> project adopt my O(n) code - I didn't come up with the algorithm, but
> merely scraped it from public research, and was shocked to note how many
> libc still had O(n^2) code at the time.  But it is still gratifying to
> see that my implementation is getting such wide adoption]

Well your code and its derived versions have been tried and tested a lot, so 
picking it up just makes sense to me, to avoid getting into the same corner 
cases and issues that your implementation already had to go through. And I am a 
lazy person of course!



reply via email to

[Prev in Thread] Current Thread [Next in Thread]