[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC] Adding a real HashTable implementation to gnulib
From: |
Bruno Haible |
Subject: |
Re: [RFC] Adding a real HashTable implementation to gnulib |
Date: |
Sun, 02 Dec 2018 14:41:01 +0100 |
User-agent: |
KMail/5.1.3 (Linux/4.4.0-138-generic; KDE/5.18.0; x86_64; ; ) |
Hi,
Darshit Shah wrote:
> I recently tried to use the hash table implementation in gnulib which resides
> in the "hash" module. However, I quickly realised that the hash table in
> gnulib
> seems to be what is otherwise popularly known as a hash set, i.e., it supports
> storing and retrieving just values from the structure.
>
> On the other hand, a hash table is usually expected to have a key->value
> mapping that is stored.
I agree that the gnulib 'hash' module is just a particular case, and
probably the module name is not very descriptive.
> Within GNU Wget, we have a fairly portable version of a hash table implemented
> which I think would be a good addition for gnulib. What do you think?
There's not only the one from wget
https://git.savannah.gnu.org/gitweb/?p=wget.git;a=blob;f=src/hash.h
https://git.savannah.gnu.org/gitweb/?p=wget.git;a=blob;f=src/hash.c
but also the one from gettext
https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gnulib-local/lib/hash.h
https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gnulib-local/lib/hash.c
and the one from glib
https://gitlab.gnome.org/GNOME/glib/blob/master/glib/ghash.h
https://gitlab.gnome.org/GNOME/glib/blob/master/glib/ghash.c
and the one from libxml
https://gitlab.gnome.org/GNOME/libxml2/blob/master/include/libxml/hash.h
https://gitlab.gnome.org/GNOME/libxml2/blob/master/hash.c
and the ones from CLN
https://www.ginac.de/CLN/cln.git/?p=cln.git;a=tree;f=src/base/hash
and many more.
The implementation you are proposing is an "open-addressed table with linear
probing collision resolution". To me that is unacceptable. When I used Kyoto
Common Lisp (KCL) many years ago, I got an endless loop during a hash table
access, and it was precisely because of this open-addressed table structure.
I don't want a code which requires careful setting of parameters in order
not to run into an endless loop. Instead better have a code that cannot run
into an endless loop *by design*.
The hash_string function that you propose shifts by 5 bits at each step;
I suspect that it has the same problem as the one I tested and discussed in
https://haible.de/bruno/hashfunc.html .
For Gnulib, I would want a generic container, i.e. a "map", like we have
"list" and "ordered set" already (modules 'list' and 'oset'). Other GNU
maintainers have reported that they like this approach.
However, this will still not fit all possible needs because there are
special cases that people want to see optimized:
- The case when the key is a string; additionally when the key is
allocated in an obstack and there is no remove.
- The struniq function (as in localename.c).
Then, what about extra requirements?
- The existing gnulib 'hash' module is pretty unique: it keeps statistics.
But is anyone really using this feature?
- malloc vs. xmalloc.
- Multithread-safety should IMO not be considered as an extra requirement.
This is better done in application logic, because typically in the scope
of the lock the application will do more than just the hash table lookup.
Bruno
- Re: [RFC] Adding a real HashTable implementation to gnulib,
Bruno Haible <=