[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Growable arrays?

From: David Kastrup
Subject: Re: Growable arrays?
Date: Thu, 14 Jun 2012 19:15:31 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.1.50 (gnu/linux)

Daniel Hartwig <address@hidden> writes:

> On 14 June 2012 23:34, David Kastrup <address@hidden> wrote:
>> Huh?  When resizing a hash table by doubling, you need to recoalesce
>> each bucket to two buckets, one of which is the original.  Doubling
>> the size of the underlying vector is a reasonable start to do this
>> operation as a half-in-place rather than a three-address variant.
> What is this half-in-place algorithm that makes this efficient?  If
> the table is to remain balanced, all items should be rehashed and
> reallocated to a new bucket and there is no correlation between an
> items old and new buckets (hash).

Huh?  Hashing computes a hash value, and the bucket is determined by
hash % vectorsize.  Double the vector size, and the new bucket locations
are at bucket and bucket+vectorsize, so you need to coalesce the bucket
at bucket into the buckets at bucket and bucket+vectorsize.

Why would there be no correlation between old and new buckets when they
are calculated based on the same hash code?

> 1. allocate new vector
> 2. assign each item to it's new bucket
> 3. destroy old vector
> … when the table does not control it's own memory but uses an
> underlying resizable type:
> 1. resize vector
> 1a. allocate new vector
> 1b. move data to head of new vector
> 1c. destroy old vector
> 2. assign each item to it's new bucket but this time watch out for
> already populated buckets …

I have no idea what you are talking about.  Each bucket contains a list.

> which to me /seems/ less efficient, and particularly bad if the
> buckets are linked lists or open addressed.

Complexity is pretty much the same if we don't bother about memory
locality.  Being able to process the buckets sequentially, however, is
advantageous because they then stay mostly in cache.  Splitting a single
bucket list into two lists is easy to do in-place.

> But I am curious if this can be done efficiently, and indeed if it is
> more efficient than the table managing it's own memory.

It can never be more efficient than "the table managing its own memory"
since obviously it is then able to pick the best algorithm.  Which in
this case is simply identical with the general-purpose algorithm.  So
while it is not more efficient in computing resources (apart from the
small impact due to duplicate code), it is more efficient in programmer
resources: cost of writing, understanding, debugging, documenting.

David Kastrup

reply via email to

[Prev in Thread] Current Thread [Next in Thread]