help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Identifying sources of allocations in a toy Mandelbrot package


From: Psionic K
Subject: Re: Identifying sources of allocations in a toy Mandelbrot package
Date: Sat, 27 Jan 2024 10:07:09 +0900

> I find it interesting that:
> - native compilation improves so little compared to byte compilation.
> - floating point calculations are very slow.
> - emacs-lisp is very slow

>From first principles and numbers I have seen so far, I can guess
what's happening with little uncertainty.  The Elisp interpreter must
store all created conses and non-immediate values, however
short-lived, into memory, converting every computationally intensive
workload into a memory-intensive workload.  None of this is freed for
re-use between GC's, so in about 1/100th of a second, the memory
controller is fully saturated and the CPU grinds to a halt.

There are no workarounds for this using GC settings.  The GC is
non-generational, so frequent GC's must mark & sweep a lot of memory
that is irrelevant to the currently running function.  Frequent GC's
will saturate the memory attempting to free small amounts that are
relevant to the cache sizes.  Furthermore, the GC is non-moving,
resulting in fragmented reads and writes, so even for shorter
computations, much of the loaded cache could be wasted.

The sum of these behaviors is still faster than human speed, but
almost pathalogically worst case usage of memory and cache, requiring
multiple trips to RAM for almost all computations and limiting
effective CPU register bandwidth to a small multiple of RAM.  The
programmer has few tools to increase computational density and all
workloads are memory bandwidth limited.

To remedy the most egregious behavior, which is that computations
imply writes, we would take advantage of how functions usually throw
away all of their intermediate values except the return value and
implement stack-like behavior by compacting and freeing after cons
thresholds calculated at call boundaries.  We can still give this
compacted memory back to our pathologically worst GC, but it will be
less fragmented and there will be much less of it.

A lot of this can be decided at compile time, and the compiler would
write values that need to be preserved into one stack while writing
other values at a different stack and re-setting the pointer.  I can
tell that this is not happening at all because otherwise the
Mandelbrot would not be consuming multiple gigabytes of RAM for values
that are thrown away at first use.

Elisp could be exceptionally fast for small runs, especially with
today's generous L1 sizes.   That drops off as the interpreter and new
data start hitting L2, L3, and then RAM.  Any work on the compiler
should focus on keeping memory from being entrusted to the existing GC
because no amount of doing less work means anything if all work also
consumes memory and freeing memory is hugely wasteful.

Fixed point was neat!  I didn't think to try that.  Thanks for code!



reply via email to

[Prev in Thread] Current Thread [Next in Thread]