Strange cfexecd/cfagent interaction

From: David J. Bianco
Subject: Strange cfexecd/cfagent interaction
Date: 20 Aug 2002 12:07:02 -0400

I've noticed on about 7 of my machines (out of about 150), cfexecd
seems like it can't run the cfagent process when it starts up.  Here's
what I see in my syslog:

Aug 20 11:01:53 xxx.jlab.org cfexecd[26729]:  cfengine defines no system
administrator address
Aug 20 11:01:53 xxx.jlab.org cfexecd[26729]:  Need: sysadm = ( address@hidden )
in control 

Now, I use the same config files on each of my hosts, and the same
binaries too, architecture permitting.  None of my other hosts complain
and a manual check of the cfagent.conf file shows that I do define
my email address there properly.  I even get tons of reports emailed to
me from the other machines, but not these malfunctioning 7.

On the machines that malfunction, an strace of cfexecd when it starts
up shows the following excerpt:

[pid   997] close(0)                    = 0
[pid   997] getpid()                    = 997
[pid   997] rt_sigaction(SIGRT_0, {SIG_DFL}, NULL, 8) = 0
[pid   997] rt_sigaction(SIGRT_1, {SIG_DFL}, NULL, 8) = 0
[pid   997] rt_sigaction(SIGRT_2, {SIG_DFL}, NULL, 8) = 0
[pid   997] execve("/var/cfengine/sbin/cfagent",
["/var/cfengine/sbin/cfagent", "-z"], [/* 57 vars */]) = 0
[pid   997] fcntl(0, F_GETFD)           = -1 EBADF (Bad file descriptor)
[pid   997] --- SIGSEGV (Segmentation fault) ---
<... read resumed> "", 4096)            = 0
--- SIGCHLD (Child exited) ---

Translation: cfexecd tried to exec cfagent -z.  Cfagent started, but
before main() was invoked the process initialization routine tried 
to see if stdin should be preserved across the exec.  Stdin was already
closed, though, so fcntl() segfaulted before cfagent really had a chance
to run.  

I traced this down to one line in cfpopen.c which seemed to be the 
trigger for this behavior, line 89:

if (pid == 0)
    switch (*type)
       case 'r':

           close(pd[0]);        /* Don't need output from parent */

           if (pd[1] != 1)
              dup2(pd[1],1);    /* Attach pp=pd[1] to our stdout */
              dup2(pd[1],2);    /* Merge stdout/stderr */


This is the line that actually closes stdin for the newly created
child process.  If I comment it out, cfagent runs beautifully.
If I leave it in, it bombs when cfexecd starts up.  

Now, I would argue that this is probably a bug in fcntl, since it
should do some sort of error checking and return a -1 with errno
set, rather than just segfaulting.  Still, this code has been failing
on more than one OS.  The 7 machines it has trouble on are a mixture
of HP-UX, Linux and Solaris. 

Has anyone else seen this?  What would the implications be of *not*
closing the child's stdin before execing cfagent?  My brief analysis
leads me to believe that it would be pretty safe, but I haven't looked
into every call to cfpopen() in all parts of the code.  

Anyway, I'm not sure what the final fix for this is, but it seems 
that keeping stdin open might be a good one.


David J. Bianco, GSEC           <address@hidden>
Thomas Jefferson National Accelerator Facility

     The views expressed herein are solely those of the author and
            not those of SURA/Jefferson Lab or the US DOE.

