help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How does gawk allocate memory for arrays?


From: Ed Morton
Subject: Re: How does gawk allocate memory for arrays?
Date: Mon, 30 May 2022 22:15:23 -0500
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1



On 5/30/2022 10:02 PM, Andrew J. Schorr wrote:
On Mon, May 30, 2022 at 09:46:09PM -0500, Ed Morton wrote:
That's what I suspected and I was asking out of curiosity if anyone know what
that size was and if the expansion occurs in chunks that are the same size as
the original or different sizes.
 From str_array.c:

/* grow_table --- grow a hash table */

static void
grow_table(NODE *symbol)
{
         BUCKET **old, **new;
         BUCKET *chain, *next;
         int i, j;
         unsigned long oldsize, newsize, k;
         unsigned long hash1;

         /*
          * This is an array of primes. We grow the table by an order of
          * magnitude each time (not just doubling) so that growing is a
          * rare operation. We expect, on average, that it won't happen
          * more than twice.  The final size is also chosen to be small
          * enough so that MS-DOG mallocs can handle it. When things are
          * very large (> 8K), we just double more or less, instead of
          * just jumping from 8K to 64K.
          */

         static const unsigned long sizes[] = {
                 13, 127, 1021, 8191, 16381, 32749, 65497,
                 131101, 262147, 524309, 1048583, 2097169,
                 4194319, 8388617, 16777259, 33554467,
                 67108879, 134217757, 268435459, 536870923,
                 1073741827
         };

And int_array.c:grow_int_table appears to use the same sizes.

However, cint_array.c appears to have a different approach for
arrays of consecutive integers:

/*
  * To store 2^n integers, allocate top-level array of size n, elements
  * of which are 1-Dimensional (leaf-array) of geometrically increasing
  * size (power of 2).
  *
  *  [0]   -->  [ 0 ]
  *  [1]   -->  [ 1 ]
  *  |2|   -->  [ 2 | 3 ]
  *  |3|   -->  [ 4 | 5 | 6 | 7 ]
  *  |.|
  *  |k|   -->  [ 2^(k - 1)| ...  | 2^k - 1 ]
  *  ...
  *
  * For a given integer n (> 0), the leaf-array is at 1 + floor(log2(n)).
  *
  * The idea for the geometrically increasing array sizes is from:
  *      Fast Functional Lists, Hash-Lists, Deques and Variable Length Arrays.
  *      Bagwell, Phil (2002).
  *http://infoscience.epfl.ch/record/64410/files/techlists.pdf
  *
  * Disadvantage:
  * Worst case memory waste > 99% and will happen when each of the
  * leaf arrays contains only a single element. Even with consecutive
  * integers, memory waste can be as high as 50%.
  *
  * Solution: Hashed Array Trees (HATs).
  *
  */

If the subscripts are record numbers, then I guess the "cint" array
type is the relevant one.
Great, thanks for all the info.

    Ed.

Regards,
Andy


reply via email to

[Prev in Thread] Current Thread [Next in Thread]