Hi Andy,
Thanks! I just tried out the master branch of guile in git (the one tagged v2.9.2). It now passes all of my unit tests. So that's good! ... More or less -- there's still some infrequent multi-threading bug(s). Let me describe.
My unit test just transitions C->guile->C and returns, in rapid succession, in 20 threads. So, in pseudocode:
SCM do_stuff(SCM a, SCM b) {
scm_to_utf8_string(a);
scm_to_int(b);
... minor stuff... return scm_from_int(...);
}
scm_c_define_gsubr("do-things",2,0,0, do_stuff);
void thing_doer(int thread_id) {
for (i=0; i=15000; i++)
char str[100];
sprintf(str, "(do-things foo %d)", i);
scm_c_catch(scm_eval_string, str);
}
main () {
for (int i=1; i<15; i++) // start 15 threads
std::thread(&thing_doer, i);
}
I'm guessing the above code spends maybe 90% of its time bouncing between guile and C. The string "(do-things foo 42)" changes each time in the loop, so, not sure how the compile vs. interpret tradeoff is done. Either way, its relatively trivial. Likewise, the do_stuff() C routine is fairly thin; after decoding it's args, it doesn't do all that much (sub-microsecond of computing). Based on old, old measurements, scm_eval_string really is the primary CPU consumer, in the 20-microseconds range. Launching 15 threads means that this thing is racing as fast as it can.
Anyway, with guile-2.9.2, the above crashes after about 10-15 minutes, either with memory corruption, or with segfault. I worried, so I retested with guile-2.2.4 ... which also crashes, but much much less frequently: seven times in 44 hours wall-clock time (so once ever 6 hours). Which is still more than desired, but...OK.
So where's the crash? No clue. Could be my code, could be guile. Since there's a big difference between guile-2.2 and guile-2.9, I'm ready to blame guile. I did try to run it with `valgrind --tool helgrind` and got an ocean of complaints about guile GC, which are probably harmless. I haven't tried to dig deeper yet.
-- Linas