[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC v5 1/2] util: add memmem replacement function
From: |
Claudio Fontana |
Subject: |
Re: [Qemu-devel] [RFC v5 1/2] util: add memmem replacement function |
Date: |
Fri, 15 May 2015 17:13:42 +0200 |
User-agent: |
Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Thunderbird/24.0.1 |
Hello Eric,
On 15.05.2015 16:56, Eric Blake wrote:
> On 05/15/2015 08:12 AM, Paolo Bonzini wrote:
>>
>>
>> On 15/05/2015 15:57, Claudio Fontana wrote:
>>> The header here mentions GPLv2+, but the module data for memmem in gnulib
>>> mentions LGPLv2+.
>>>
>>> Very confusing.
>>>
>>> The COPYING file in gnulib mentions:
>>>
>>> "The files in here are mostly copyright (C) Free Software Foundation, and
>>> are under assorted licenses. Mostly, but not entirely, GPL.
>>>
>>> Many modules are provided dual-license, either GPL or LGPL at your
>>> option. The headers of files in the lib directory (e.g., lib/error.c)
>>> state GPL for convenience, since the bulk of current gnulib users are
>>> GPL'd programs. But the files in the modules directory (e.g.,
>>> modules/error) state the true license of each file, [...]
>>> "
>>>
>>> So if one would want to include the gnulib code without using the gnulib
>>> esoteric modules mechanisms, what should one do?
>>
>> Change it manually, I guess.
>
> It's still possible to use gnulib-tool for a one-time export of the
> modules in question into a temporary project with correct licensing
> headers written by gnulib-tool, then copy from that temporary project
> into qemu. It's probably just as fast to do it by hand, but using
> gnulib-tool in the procedure (and documenting in the commit message what
> that procedure was, to make it easier for the next guy to copy) is
> probably friendlier. I'm not sure off-hand what the command line would
> be, but could help if that's the approach we want to take.
>
> Or, since gnulib's memmem is (mostly) sync'd with glibc's memmem (or at
> least was in sync at one point), and glibc is under LGPLv2+, why not
> copy straight from glibc instead of from gnulib, and then you don't have
> to modify any headers? (I haven't looked lately, but glibc may have
> optimizations that should be backported into gnulib)
I will check the status of glibc's version, seems a good idea to get it from
there.
>
> Historical note: gnulib had this particular implementation of memmem
> first, as written by me under a dual license; then I got glibc to
> incorporate it under just LGPLv2+; then since the initial
> implementation, glibc has been better optimized by others. If you really
> want to, you could observe that since I released the same original
> memmem implementation under multiple licenses at the time I first wrote
> it, you could therefore find and copy an older version under an even
> looser license in the newlib project (but without glibc's enhancements,
> as I can't dual-license someone else's follow-up work):
> https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;a=blob;f=newlib/libc/string/memmem.c;h=25704e46;hb=HEAD
>
> Or back to the original question - why are we worrying about the O(n)
> memmem implementation when the naive O(n^2) is MUCH shorter and easier
> to understand, and where the scaling penalty is only apparent on really
> long corner case strings? Is our use of memmem going to hit the corner
> cases where scaling matters?
For the specific use case here (memory search functionality),
the haystack is normally the factor which really can scale up a lot.
I would expect the needle to be usually smallish (though for string search
sometimes the needle could probably arrive at the threshold of 32).
When scanning a very large memory range, considering a binary string with
roughly length around 32, do you think that the non-trivial implementation
could bring some noticeable benefits?
If it's below 32, do we get any benefits compared to the trivial implementation?
>
> [Personal note: I'm still tickled pink every time I see yet another
> project adopt my O(n) code - I didn't come up with the algorithm, but
> merely scraped it from public research, and was shocked to note how many
> libc still had O(n^2) code at the time. But it is still gratifying to
> see that my implementation is getting such wide adoption]
>
Well your code and its derived versions have been tried and tested a lot, so
picking it up just makes sense to me, to avoid getting into the same corner
cases and issues that your implementation already had to go through. And I am a
lazy person of course!
Ciao,
Claudio