[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: hacking on 1.7 threads

From: Julian Graham
Subject: Re: hacking on 1.7 threads
Date: Sat, 30 Oct 2004 16:45:14 -0400

Alright, having combatted the corruption that seems to occur during
the cancellation handler for about a solid straight week and a half,
I'm getting pretty demoralized.  Here's where I am at this point:

- Realized that the GC must be aware of the list of thread cleanup
handler expressions and protected them as part of scm_thread_mark
- The scm_thread data structure removes *itself* from the all_threads
list once it's finished, so I don't think premature deallocation is a
- Realized that the GC might be interrupted by a cancellation signal
in the middle of a collection, since I'm pretty sure it calls
functions that are cancellation points for deferred-cancellation POSIX
threads.  I assume that a half-finished collection could have
disastrous effects for data consistency, so I've taken the stopgap
measure of disabling cancellation while scm_igc() is running.
- It occurs to me that after the cancellation signal is received and a
bunch of pthreads stuff is unwound to call the pthread cancellation
handler, the Scheme evaluation environment for that thread may be in
some unknown state...

...which might explain why I've been getting SIGABRTs and SIGSEGVs
when I call scm_i_eval in my pthread cancellation handler.  Here's a
characteristic stack trace for a SIGABRT

#42 0x40017c2c in ?? ()
#43 0x40b68228 in ?? ()
#44 0x40b681f0 in ?? ()
#45 0x40007def in _dl_lookup_symbol () from /lib/
#46 0x4008e26c in scm_cons (x=0x806e270, y=0x204) at pairs.c:59
#47 0x40058c57 in scm_i_eval (exp=0x806e270, env=0x4031dc40) at eval.c:5859
#48 0x400b4f27 in handler_cancellation (thread=0x80932a8) at threads.c:302
#49 0x4018303b in __pthread_unwind () from /lib/tls/
#50 0x4017e4a8 in sigcancel_handler () from /lib/tls/

...with many many more ?? stack frames and then a SIGABRT in some
internal libc function.  I can't seem to reproduce the SIGSEGV at the
moment.  I've tried preserving the current evaluation environment in
addition to the expression at the time of the 'push' from Scheme code,
and then evaluating the expression in that saved environment when the
pthread cancellation handler runs, but that doesn't seem to do much
good (though it does raise the question: In what environment should
the cancellation handler expressions be evaluated?  The env. at the
time they were pushed onto the list?  Or the environment at the time
the thread received the cancellation signal?  And what should the
correct error-handling behavior be during evaluation of cleanup
handler expressions?).
  So having tried all this and more with no success, I'm kind of at my
wits' end;  if anyone would like to volunteer to take this code over
from me (it's like 50-60 lines of new code in threads.c,
threads-plugin.c, pthreads-threads.c, and a teensy little bit in
gc.c), I'd be more than happy to comment it up and post the files or a
patch to HEAD.  Or you can rewrite the whole thing from scratch, since
my design may be just plain stupid.


On Sun, 24 Oct 2004 11:29:06 +0200, Mikael Djurfeldt
<address@hidden> wrote:
> Note, though, that this is the easy part.  I do expect that there also
> could arise nasty complications having to do with the order in which
> things are done at cancellation.  It's for example important that the
> scm_thread data structure isn't deallocated before the handlers are
> invoked.  It's also important that the GC is still aware of the thread
> at that point in time.  It's important that the thread *is* properly
> deallocated *after* the handlers have run---that kind of stuff.  But
> maybe there's no problem at all.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]