[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
SCSH process forms and the signal delivery thread
From: |
Derek Upham |
Subject: |
SCSH process forms and the signal delivery thread |
Date: |
Sun, 26 Mar 2017 09:47:00 -0700 |
User-agent: |
mu4e 0.9.17; emacs 25.1.1 |
I'm working on an implementation of SCSH-style "process forms" for Guile, and
I'm noticing occasional hangs. I think I have an understanding of root cause,
and I'd like people to double-check my analysis.
My code forks its process using the "primitive-fork" function. The function's
return value indicates whether the current process is the parent or the child
process. The parent and child have user-level data that start out identical
but can vary independently thereafter: stacks and heaps. The parent and child
have kernel-level data that are shared: file descriptors, and (crucially)
mutexes. All we can do to stop sharing the kernel-level data is to drop our
handles to the data.
The BDW-GC implementation is configured to be thread safe, in case Guile runs
multiple threads. Therefore per <http://www.hboehm.info/gc/scale.html>:
"It causes the collector to acquire a lock around essentially all allocation
and garbage collection activity."
That means after the child process spawns, there is one kernel mutex
controlling access to two heaps in two separate processes. If the child
process needs to do work in the GC layer, it blocks: the signal delivery thread
in the parent is holding the mutex, and will hold the mutex until it gets some
data on its reporting pipe. This happens when a race condition ends up in the
wrong order.
Based on this comment from scm_fork() I should be seeing a warning when I fork
with a running thread:
scm_i_finalizer_pre_fork ();
if (scm_ilength (scm_all_threads ()) != 1)
/* Other threads may be holding on to resources that Guile needs --
it is not safe to permit one thread to fork while others are
running.
In addition, POSIX clearly specifies that if a multi-threaded
program forks, the child must only call functions that are
async-signal-safe. We can't guarantee that in general. The best
we can do is to allow forking only very early, before any call to
sigaction spawns the signal-handling thread. */
scm_display
(scm_from_latin1_string
("warning: call to primitive-fork while multiple threads are running;\n"
" further behavior unspecified. See \"Processes\" in the\n"
" manual, for more information.\n"),
scm_current_warning_port ());
(This is all Guile 2.2 code.) The call to scm_i_finalizer_pre_fork() killed
off the finalization thread, so we're safe there:
void
scm_i_finalizer_pre_fork (void)
{
#if SCM_USE_PTHREAD_THREADS
if (automatic_finalization_p)
{
stop_finalization_thread ();
GC_set_finalizer_notifier (spawn_finalizer_thread);
}
#endif
But nothing stops the signal delivery thread. In fact, scm_all_threads()
explicitly skips the signal delivery thread; we don't get a warning:
{
/* We can not allocate while holding the thread_admin_mutex because
of the way GC is done.
*/
int n = thread_count;
scm_i_thread *t;
SCM list = scm_c_make_list (n, SCM_UNSPECIFIED), *l;
scm_i_pthread_mutex_lock (&thread_admin_mutex);
l = &list;
for (t = all_threads; t && n > 0; t = t->next_thread)
{
if (t != scm_i_signal_delivery_thread)
{
SCM_SETCAR (*l, t->handle);
l = SCM_CDRLOC (*l);
}
n--;
}
*l = SCM_EOL;
scm_i_pthread_mutex_unlock (&thread_admin_mutex);
return list;
}
The signal delivery thread is running in order to support SCSH's "early"
auto-reap policy, triggered by SIGCHLD. The alternative is the "late" policy,
which triggers after garbage collections. That's not good for parents that do
lots of spawning but very little garbage generation compared to their heap
size. They end up with lots of zombies.
One solution to support the "early" policy might be to tweak scm_fork() so it:
1. Blocks signals.
2. Records the current custom handlers.
3. Resets all handlers.
4. Kills the signal delivery thread.
5. Forks.
6. Starts the signal delivery thread in parent and child.
7. Re-loads the custom handlers in parent and child.
8. Unblocks signals.
Does anyone have other possibilities?
I don't think there's a safe, general solution for running "identical"
finalizers in the parent and the child, so shutting down the finalizer in the
child is the best we can do. Is it worth restarting just the parent's
finalizer thread after forking?
Other, independent, cleanup opportunities:
- The docs for "primitive-fork" need to mention that calling "primitive-fork"
shuts down finalizers for the parent and the child.
- Calling “restore-signals” should stop any running signal delivery thread, to
bring Guile back to a consistent state.
Thanks,
Derek
--
Derek Upham
address@hidden
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- SCSH process forms and the signal delivery thread,
Derek Upham <=