Well, I solved it.
Each time Chicken does a GC (major or minor), it calls getrusage before
and after so it can calculate the amount of CPU time used in the GC.
Unfortunately, this takes around 60 uS per call on this SPARC SunFire
v120, which turns out to be about 90% of the total runtime of my
program.
With the following patch the runtime of my test program drops from 94
seconds to 9 seconds!
This turns out to be a significant improvement not only for very short
calls into Scheme, but also for normal programs. For example, it
dramatically lowers all the times for nsample, especially with small
nursery sizes: