[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[coreutils] another potential sort bug?

From: Eric Blake
Subject: [coreutils] another potential sort bug?
Date: Thu, 13 Jan 2011 19:43:06 -0700
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.7

POSIX is explicit that in a multi-threaded app, use of any
non-async-signal-safe function in between fork() and exec*()/exit() may
result in undefined behavior.  For example, if thread 1 is in the middle
of a malloc() and holds a mutex, and thread 2 calls fork(), then any
attempt by the child process to call malloc() will deadlock (the
malloc() in the child will attempt to obtain the mutex, but thread 1 no
longer exists in the child to release the mutex).

Most of the coreutils that use fork/exec (install, timeout) are
single-threaded, so this is not an issue.  But now that sort is
multi-threaded, the fact that it calls fork/exec means we have to audit
all the code for safety.  Right now, I see several potential calls to
error(_()) in the child process, and error() definitely is not
async-signal-safe (the call to gettext() may malloc, and error() itself
certainly has calls to malloc() and various stdio functions).

It is still possible to use error() in the child if we can guarantee
that no other thread can be in the middle of any of the
non-async-signal-safe functions used by the child, and thus no deadlock
can occur in any of those functions.

One thought is that pthread_atfork() can be used to provide such a
guarantee - POSIX admits that it doesn't scale well to libraries where
there is no idea what other functions/mutexes might be use in other
threads, but perhaps we can make it work fine in sort() where we can
easily audit all resources that must be locked correctly before the
child attempts an otherwise unsafe action, because we aren't linking to
multi-thread libraries.  But that would mean hiding all instances of
malloc() and stdio functions behind a mutex controlled by the
pthread_atfork handlers, which seems prohibitively expensive.

Another thought is to open a cloexec pipe from child to parent, where if
the child encounters any failure, instead of calling error() in the
child, we call write() to pass the (untranslated) message back to the
parent, and the parent then listens on the pipe, receives any
untranslated message, and calls error(_()) on the result, such that all
malloc() and stdio is removed from the child.  This is probably easier
to implement, and certainly has the benefit that it doesn't impact the
common case by surrounding all other unsafe actions in a mutex.  But it
does put more fd pressure (at least 2 fd, if we can share the error pipe
among all children, and up to 2*threads if it has to be one pipe per
thread calling fork).

Are we worried enough about the potential for a malloc mutex deadlock
that we should go ahead and use one of these methods to protect sort
from deadlock or other undefined behavior in the child process?

Eric Blake   address@hidden    +1-801-349-2682
Libvirt virtualization library

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]