bug-m4
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: poor m4 hash performance


From: Eric Blake
Subject: Re: poor m4 hash performance
Date: Sun, 04 Jun 2006 17:46:08 -0600
User-agent: Thunderbird 1.5.0.4 (Windows/20060516)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Ben Pfaff on 6/4/2006 4:15 PM:
> 
> Is there a good reason why m4 should not use a hash table that
> grows dynamically?  It is easier to deal with software that can
> figure out parameters on its own rather than having to be told.

CVS head m4 does dynamically grow its hashtables; the problem is that
backporting it to the 1.4 branch is not trivial, and at this point, the
1.4 branch is in non-invasive bug fix mode only.  The point of this thread
is to determine whether the 1.4.x hashing performance can be noticeably
improved, in which case it is a performance bug and safe to apply such a
patch.  Another thing to investigate is the choice of hashing functions;
right now, both the 1.4 branch and head use an algorithm borrowed from emacs:

size_t
m4_hash_string_hash (const void *key)
{
  int val = 0;
  const char *ptr = (const char *) key;
  char ch;

  while ((ch = *ptr++) != '\0')
    {
      if (ch >= 0140)
        ch -= 40;
      val = ((val << 3) + (val >> 28) + ch);
    };
  val = (val < 0) ? -val : val;
  return val;
}

Other hash algorithms exist which may be faster or have better spread
across the modulo bucket size chosen (m4 head always sticks with (2**n)-1
buckets, whereas 1.4 defaults to 509 but allows arbitrary -H options
regardless of whether they are relatively prime or not).

And the idea that sparked my investigation was a question of how many
lookups are done with strings already known to be in the hashtable.
Currently, the symbol table xstrdup's the string in every entry, but if we
can determine that a large number of lookups are done on strings already
known to be in the table, some judicious sharing of strings between
entries can allow pointer comparisons instead of strcmp, as well as some
memory savings by not having to duplicate as many strings.

- --
Life is short - so eat dessert first!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEg3DA84KuGfSFAYARAqSRAJ4pKlMUIZJnfoOjBPAkZeqqGrhf5ACdE9L0
MWLaPnY3zhe3JuSmlIQAwXM=
=MYWb
-----END PGP SIGNATURE-----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]