[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gnulib's Knuth-Morris-Pratt implementation
From: |
Eric Blake |
Subject: |
Re: gnulib's Knuth-Morris-Pratt implementation |
Date: |
Fri, 4 Jan 2008 23:56:26 +0000 (UTC) |
User-agent: |
Loom/3.14 (http://gmane.org/) |
Bruno Haible <bruno <at> clisp.org> writes:
>
> But this argument also makes it unimportant to use Boyer-Moore over KMP:
> Once you've spent 5*n cycles on the simple algorithm, you don't care whether
> the computation will be finished in additional 2*n cycles or 2*n/m cycles.
>
> > The main problem
> > though is that the B-M algorithm (I guess we'd likely select
> > Turbo-Boyer-Moore to reduce the comparisons from 3N to 2N) needs to
> > move backward through the haystack. That doesn't work well on
> > multibyte strings.
>
> Yes, that's the reason why I chose KMP in the first place.
After more browsing for string algorithms, I think the Two-Way algorithm may be
the best fit for memmem (and strstr, if we decide to also replace that on
systems with quadratic strstr):
http://www-igm.univ-mlv.fr/~lecroq/string/node26.html#SECTION00260
It has the nice property of O(1) preprocessing and O(N) comparisons (strictly
bounded at 2N-M in the worst case), at the expense of random access to the data
(therefore not appropriate for mbsstr) and an ordered alphabet (so I'm not sure
if it can be used for strcase), but may actually be an algorithm that will be
acceptable to the glibc people as it does not require malloc. I'll play with
implementing it for memmem, and see how it goes.
--
Eric Blake
- Re: gnulib's Knuth-Morris-Pratt implementation,
Eric Blake <=