[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: memchr2 speed, gcc
From: |
Eric Blake |
Subject: |
Re: memchr2 speed, gcc |
Date: |
Fri, 25 Apr 2008 21:44:17 -0600 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080213 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
According to Bruno Haible on 4/25/2008 9:14 PM:
| Hi Eric,
|
|> I've got an even more efficient implementation, inspired by
|> http://www.cl.cam.ac.uk/~am21/progtricks.html
|
| Very nice! And it has no "false positive" case, like the current memchr.c
| implementation.
I discovered the web page from this glibc bug report (I only looked at the
link in the bug report, not at the proposed patch):
http://sourceware.org/bugzilla/show_bug.cgi?id=5807. Then after
implementing it, I also noticed that newlib uses the same algorithm for
some of its string functions.
|
|> Also, is anyone interested in making gnulib's memchr and strchrnul more
|> efficient by copying the optimizations learned in memchr2?
|
| That will definitely be useful, yes. But first, some polishing of the code.
| Here is a proposed patch to fix a few things on the surface:
Looks good to me, except for:
|
| ! /* Compute auxiliary longword values:
| ! repeated_one is a value which has a 1 in every byte.
| ! repeated_c1 has a c1 in every byte.
| ! repeated_c2 has a c2 in every byte. */
| ! repeated_one = 0x01010101;
| ! repeated_c1 = c1 | (c1 << 8);
| ! repeated_c2 = c2 | (c2 << 8);
| ! repeated_c1 |= repeated_c1 << 16;
| ! repeated_c2 |= repeated_c2 << 16;
| if (0xffffffffU < TYPE_MAXIMUM (longword))
| {
| ! repeated_one |= repeated_one << 31 << 1;
| ! repeated_c1 |= repeated_c1 << 31 << 1;
| ! repeated_c2 |= repeated_c2 << 31 << 1;
| ! if (8 < sizeof (longword))
| ! {
| ! int i;
| !
| ! for (i = 64; i < sizeof (longword) * 8; i *= 2)
Since we are already computing repeated_one, I'm wondering if it would be
more efficient to compute repeated_c1 = repeated_one * c1 once at the end,
rather than computing repeated_c1 in parallel with the repeated
modifications to repeated_one.
| ! machines. Choose a code that works in both cases. */
s/a //
- --
Don't work too hard, make some time for fun as well!
Eric Blake address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkgSpREACgkQ84KuGfSFAYB5agCdH7JH93LAIdK13zCy7qBr6z5O
Bh8An3JRkg/4PUw3hnUzM0/HkLMNu1ol
=rvE2
-----END PGP SIGNATURE-----