bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] Adding a real HashTable implementation to gnulib


From: Bruno Haible
Subject: Re: [RFC] Adding a real HashTable implementation to gnulib
Date: Wed, 05 Dec 2018 00:43:26 +0100
User-agent: KMail/5.1.3 (Linux/4.4.0-138-generic; KDE/5.18.0; x86_64; ; )

Hi Tim,

> Currently we have
> - positive integer (N): resize by old value * N
> - negative integer (N): resize by old value + (-N)

It's currently the opposite:
- positive integer (N): resize to old size + N
- negative integer (N): resize to old size * (-N)

> >   - The ability to set a growth policy > 0. This leads to quadratic
> >     run time behaviour. I mean, you make such an effort to have O(1)
> >     access on average, and then the fixed-size increment of the size
> >     turns the insertion operation into an average-O(n) operation.
> 
> In Wget2 we use reasonable default sizes for the hashmap to make
> resizing more or less a corner case.

The problem is that when you guessed wrong about the "corner case",
the rehash will consume 99.9% of your program's run time.

> How can we do it better ?

1. Remove the ability to "resize to old size + N",
2. (optional) When "resize to old size * FACTOR", allow the factor
   to be 1.5 or 1.3 or something like that.

> >   - Some functions take keysize and valuesize arguments, some don't.
> >     I'm confused.
> 
> The "noalloc" functions just store the given pointers for key and value,
> that means without allocation. The hashmap takes ownership of the
> objects and has to release them when being removed/destroyed (or use a
> NULL destructor when the values are static). This can be used to
> generate a search index for an existing bunch of values (objects /
> structures). Like a "unique index" for a field in a database table.
> Example function: wget_hashmap_put_noalloc()
> 
> Then we have the functions that do (shallow) allocations/clones, like
> wget_hashmap_put(). This is for using the hashmap as a container, the
> original objects have to be (shallow) released outside of the container.

Hmm. What you describe here is whether ownership is passed to the container
or kept outside the container. My question was about the keysize and
valuesize arguments, that is, about the nature of the key and value objects
(a. a pointer, pointing to anything in memory, or
 b. a region of memory, with a start address and an end address).

I think both questions are orthogonal:
  - In either a or b, you can keep the ownership outside the container
    by using a NULL destructor function.
  - For container-owned objects:
    In case a, you need a clone or copy function indeed,
    In case b, the container needs to clone precisely the given region of
    memory.

But I haven't thought long about it. What do you think?

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]