[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question about critical_factorization() in the Two-Way algorithm

From: Eric Blake
Subject: Re: Question about critical_factorization() in the Two-Way algorithm
Date: Tue, 22 Jun 2010 08:09:35 -0600
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20100430 Fedora/3.0.4-3.fc13 Lightning/1.0b2pre Mnenhy/0.8.2 Thunderbird/3.0.4

On 06/22/2010 03:18 AM, Pádraig Brady wrote:
>> http://sourceware.org/bugzilla/show_bug.cgi?id=11607
> Coincidentally, I was about to mail you about this :)
> Thanks for pointing out that bug which I was unaware of.
> I'm currently working on multibyte enhancements to coreutils
> and was using memmem() to search for single multibyte chars.
> For the degenerate unibyte case, it was fast due to
> the special case memchr() done in that situation, but
> when comparing 2 or 3 byte utf8 characters, things slowed down,
> so I was going to use memmem-simple instead.
> If you special cased 4 bytes, it would handle most UTF-8 chars
> for example.

Special-casing up to 4 byte needles at a time, even in C code, may be a
win, because that's in the realm where we can do vectorized searching
(such as how memchr searches multiple bytes at a time).  I guess I
should play more with the idea.

> Note the docs for "memmem-simple" say if fixes
>   "This function returns incorrect values in some cases, such as when
>    given an empty needle: glibc <= 2.0, Cygwin 1.5.x."
> Could that functionality be rolled into memmem-simple so that
> memmem is just the "fast/fat" version? If not could you briefly
> expand on "some cases" above as I don't know from the docs if
> I can use memmem-simple.

The idea is that memmem-simple gives you the replacement only if the
system memmem is buggy (gives wrong answers or reads beyond bounds[1])
but without regards to whether it is quadratic, while memmem also gives
you the replacement if the system memmem is slow.  Either way, if the
replacement kicks in, you are guaranteed fast.  An example where it
makes a difference - if you compile against glibc 2.7, memmem() is not
buggy but it is slow; if you only search for compile-time length-limited
needles, memmem-simple is an appropriate choice, but if you ever search
for a user-supplied needle, then use the memmem module to avoid the
possibility of a user causing a DoS by exploiting the quadratic nature.

[1] It's surprising, looking through glibc history, how many times in
just the past year that various architectures have had bugs of reading
beyond bounds as various assembler SSE algorithms are being developed.

Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]