help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How does gawk allocate memory for arrays?


From: Andrew J. Schorr
Subject: Re: How does gawk allocate memory for arrays?
Date: Mon, 30 May 2022 23:02:49 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

On Mon, May 30, 2022 at 09:46:09PM -0500, Ed Morton wrote:
> That's what I suspected and I was asking out of curiosity if anyone know what
> that size was and if the expansion occurs in chunks that are the same size as
> the original or different sizes.

>From str_array.c:

/* grow_table --- grow a hash table */

static void
grow_table(NODE *symbol)
{
        BUCKET **old, **new;
        BUCKET *chain, *next;
        int i, j;
        unsigned long oldsize, newsize, k;
        unsigned long hash1;

        /*
         * This is an array of primes. We grow the table by an order of
         * magnitude each time (not just doubling) so that growing is a
         * rare operation. We expect, on average, that it won't happen
         * more than twice.  The final size is also chosen to be small
         * enough so that MS-DOG mallocs can handle it. When things are
         * very large (> 8K), we just double more or less, instead of
         * just jumping from 8K to 64K.
         */

        static const unsigned long sizes[] = {
                13, 127, 1021, 8191, 16381, 32749, 65497,
                131101, 262147, 524309, 1048583, 2097169,
                4194319, 8388617, 16777259, 33554467,
                67108879, 134217757, 268435459, 536870923,
                1073741827
        };

And int_array.c:grow_int_table appears to use the same sizes.

However, cint_array.c appears to have a different approach for 
arrays of consecutive integers:

/*
 * To store 2^n integers, allocate top-level array of size n, elements
 * of which are 1-Dimensional (leaf-array) of geometrically increasing
 * size (power of 2).
 *
 *  [0]   -->  [ 0 ]
 *  [1]   -->  [ 1 ]
 *  |2|   -->  [ 2 | 3 ]
 *  |3|   -->  [ 4 | 5 | 6 | 7 ]
 *  |.|
 *  |k|   -->  [ 2^(k - 1)| ...  | 2^k - 1 ]
 *  ...
 *
 * For a given integer n (> 0), the leaf-array is at 1 + floor(log2(n)).
 *
 * The idea for the geometrically increasing array sizes is from:
 *      Fast Functional Lists, Hash-Lists, Deques and Variable Length Arrays.
 *      Bagwell, Phil (2002).
 *      http://infoscience.epfl.ch/record/64410/files/techlists.pdf
 *
 * Disadvantage:
 * Worst case memory waste > 99% and will happen when each of the
 * leaf arrays contains only a single element. Even with consecutive
 * integers, memory waste can be as high as 50%.
 *
 * Solution: Hashed Array Trees (HATs).
 *
 */

If the subscripts are record numbers, then I guess the "cint" array
type is the relevant one.

Regards,
Andy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]