bug-cfengine
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Strange cfexecd/cfagent interaction


From: Tim Auckland
Subject: Re: Strange cfexecd/cfagent interaction
Date: 20 Aug 2002 09:25:00 -0700

I think you'll see the EBADF on the systems that are working too. 
That's quite normal on an exec.

I would suspect this is another instance of cfexecd's pthread running
out of stack space.  This happened a lot in the betas of 2.0.0, but
should be fixed by now.  As with any memory problem, commenting out
an unrelated line of code can sometimes "fix" the problem.

Take a look at the thread initialisation code in cfexecd, and try more
stack space, or try compiling without threads support, and see if that
makes any difference.

Tim

On Tue, 2002-08-20 at 09:07, David J. Bianco wrote:
> I've noticed on about 7 of my machines (out of about 150), cfexecd
> seems like it can't run the cfagent process when it starts up.  Here's
> what I see in my syslog:
> 
> Aug 20 11:01:53 xxx.jlab.org cfexecd[26729]:  cfengine defines no system
> administrator address
> Aug 20 11:01:53 xxx.jlab.org cfexecd[26729]:  Need: sysadm = ( address@hidden 
> )
> in control 
> 
> Now, I use the same config files on each of my hosts, and the same
> binaries too, architecture permitting.  None of my other hosts complain
> and a manual check of the cfagent.conf file shows that I do define
> my email address there properly.  I even get tons of reports emailed to
> me from the other machines, but not these malfunctioning 7.
> 
> On the machines that malfunction, an strace of cfexecd when it starts
> up shows the following excerpt:
> 
> [pid   997] close(0)                    = 0
> [pid   997] getpid()                    = 997
> [pid   997] rt_sigaction(SIGRT_0, {SIG_DFL}, NULL, 8) = 0
> [pid   997] rt_sigaction(SIGRT_1, {SIG_DFL}, NULL, 8) = 0
> [pid   997] rt_sigaction(SIGRT_2, {SIG_DFL}, NULL, 8) = 0
> [pid   997] execve("/var/cfengine/sbin/cfagent",
> ["/var/cfengine/sbin/cfagent", "-z"], [/* 57 vars */]) = 0
> [pid   997] fcntl(0, F_GETFD)           = -1 EBADF (Bad file descriptor)
> [pid   997] --- SIGSEGV (Segmentation fault) ---
> <... read resumed> "", 4096)            = 0
> --- SIGCHLD (Child exited) ---
> 
> Translation: cfexecd tried to exec cfagent -z.  Cfagent started, but
> before main() was invoked the process initialization routine tried 
> to see if stdin should be preserved across the exec.  Stdin was already
> closed, though, so fcntl() segfaulted before cfagent really had a chance
> to run.  
> 
> I traced this down to one line in cfpopen.c which seemed to be the 
> trigger for this behavior, line 89:
> 
> if (pid == 0)
>     {
>     switch (*type)
>        {
>        case 'r':
> 
>               /* THIS CLOSE IS THE TRIGGER LINE FOR THE BUG */
>            close(pd[0]);        /* Don't need output from parent */
> 
>            if (pd[1] != 1)
>               {
>               dup2(pd[1],1);    /* Attach pp=pd[1] to our stdout */
>               dup2(pd[1],2);    /* Merge stdout/stderr */
>               close(pd[1]);
>               }
> 
>            break;
> 
> This is the line that actually closes stdin for the newly created
> child process.  If I comment it out, cfagent runs beautifully.
> If I leave it in, it bombs when cfexecd starts up.  
> 
> Now, I would argue that this is probably a bug in fcntl, since it
> should do some sort of error checking and return a -1 with errno
> set, rather than just segfaulting.  Still, this code has been failing
> on more than one OS.  The 7 machines it has trouble on are a mixture
> of HP-UX, Linux and Solaris. 
> 
> Has anyone else seen this?  What would the implications be of *not*
> closing the child's stdin before execing cfagent?  My brief analysis
> leads me to believe that it would be pretty safe, but I haven't looked
> into every call to cfpopen() in all parts of the code.  
> 
> Anyway, I'm not sure what the final fix for this is, but it seems 
> that keeping stdin open might be a good one.
> 
>       David
> 
> 
> -- 
> David J. Bianco, GSEC         <address@hidden>
> Thomas Jefferson National Accelerator Facility
> 
>      The views expressed herein are solely those of the author and
>           not those of SURA/Jefferson Lab or the US DOE.
> 
> 
> 
> _______________________________________________
> Bug-cfengine mailing list
> address@hidden
> http://mail.gnu.org/mailman/listinfo/bug-cfengine






reply via email to

[Prev in Thread] Current Thread [Next in Thread]