I must admit that my statement is mostly theoretical, based on the profiling data, looking at the code and realizing how many steps there are that the compiler cannot possibly optimize away. Making a prediction about absolute gain is hard without trying.
What did David do? Throw out reference counting completely? How much effort is that? I would like to try it out myself and do some actual measurements.
I removed the reference counting code entirely and replaced it by libgc.
That was not a huge effort. As I remember, the hardest part was auditing the code for cases were timely destruction of heap objects was important. IIRC there were a few such occurences around the Editor and UI code.
What disadvantages of garbage collection would be significant to TeXmacs? The main problem that I know of is that the memory requirements may be somewhat higher due to delayed deallocation. That, however, may easily be outweighed by the size-reduction of all objects by the counter itself.
From memory, the transition caused approximately 2x increase in memory usage and a 2x slowdown in large operations such as the compilation of the complete documentation. The slowdown in interactive use was noticeable too.
Memory usage patterns that makes it tricky to support by a generic GC: it allocates vast numbers of very small objects (mainly tree_rep and string_rep), many of which are long lived. Presumably a generational GC would perform well, but IIRC, libgc was not generational at the time.
It is entirely possible that better generic GC services are available today that would provide better results.
How would const references help avoiding reference counting?
A large amount of the no-op reference counting done by texmacs occurs for method parameters: when a counted pointer is passed into a method call, it is incremented when entering the method, and decremented when unwinding the method's stack frame. This is particularly bad in tree-traversal code when all tree_rep objects in a tree are touched on recursive descent and then again on unwinding, causing many cache misses.
Presumably, good speedups could be achieved by using uncounted pointers when the compiler could prove that 1. the callees do not store the pointer in data structures 2. the pointer cannot be returned up the call stack above the caller that own the last counted reference to the object.
I failed to find a way to get the C++ compiler to prove point 2, that did not require major changes to the texmacs code, so I gave up on this approach. Some of the newer C++ 0x language features might make it possible to solve this problem.
While we speak of memory management, I had suspected for a long time that one cause of texmacs slowness is bad time-space locality for the omnipresent tree-string structure. IIRC, the small-objects allocator of texmacs uses a naive LIFO free-list approach to reusing memory. I suspect that it leads to tree structures where neighboring nodes are scattered in memory.
One project that could potentially have large benefits in reducing memory management overhead and cache misses would be to use contiguous buffers for both tree and string blocks forming a single structure. The blocks themselves would not be managed (no GC or refcounting), and the structure would be periodically repacked (maybe using a copying gc approach) to reclaim memory add inserted blocks at the ideal place in the structure.