Re: [bug-gawk] Memory usage of multi-dimensional arrays

You are welcome.

I might have added that there shouldn't be any additional memory consumption with the 2nd, 3rd,... array element for each row, since these will be hashed into the 4kB area reserved when a row is allocated. Only when the hash table reaches a certain limit you will observe another significant increase. Details depend on the implementation. You might check the literature on hash tables in general if you aren't familiar with this technique.

-W

On 22 March 2018 at 16:10, Christian Schneider <address@hidden> wrote:

Hi Wolfgang,

Thank you very much for the clarification. To be honest, without your
extra explanations, I would have probably not understood the quote.

Best,
Christian

On Fri, 2018-03-16 00:27:32 (-0700), Wolfgang Laun wrote:
> It does say something about the behaviour: "...implementations, which
> *typically
> use hash tables* to store array elements and values."
>
> This means that in the twodimensional case you have 100000 hash tables.
>
> It would still be possible to create a big two-dimensional array but
> swapping would set in and it would become very slow.
>
> No, there is no way of making that behaviour more efficient.
>
> Either map two index values to a single value using a function, or you are
> using the wrong tool for your task.
>
> Cheers
> Wolfgang
>
> On 16 March 2018 at 02:23, Christian Schneider <address@hidden>
> wrote:
>
> > Hi all,
> >
> > I encountered an interesting behaviour with multi-dimensional arrays and
> > was wondering, if this is expected or a bug:
> >
> > Example 1:
> >
> > ## create array with 100k elements
> > BEGIN { for (I = 0; I < 100000; I++) X[I] = 0 }
> > ## wait to allow for memory analysis
> > END { while (1 == 1) Y = 0 }
> >
> > Result:
> >
> > The memory usage (looked at with "ps" and all its limitations) is of
> > order a few (~8) bytes per element, as expected.
> >
> > Example 2:
> >
> > ## create multi-dimensional array with 100k elements
> > BEGIN { for (I = 0; I < 100000; I++) X[I][I] = 0 }
> > ## wait to allow for memory analysis
> > END { while (1 == 1) Y = 0 }
> >
> > Result:
> >
> > Uses a few (~4) kB per element. This also means, an array with 1m
> > elements cannot even be created on a machine with 8 GB RAM anymore.
> >
> > I could not find any documentation on that behaviour. If it is
> > considered "normal", could you mention this somewhere, please? Is there
> > a way to make these arrays more efficient?
> >
> > Thank you very much for any comments.
> >
> > Best regards,
> > Christian
> >
> > P.S.: Please CC me, as I am not subscribed.
> >
> > P.P.S.: version:
> > GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5, GNU MP 6.1.2)
> > from: Debian 9.4, amd64
> >
> >

From:	Wolfgang Laun
Subject:	Re: [bug-gawk] Memory usage of multi-dimensional arrays
Date:	Thu, 22 Mar 2018 16:20:03 +0100