bug-make
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: make-3.79 on solaris8 broken


From: Kevin Nomura
Subject: Re: make-3.79 on solaris8 broken
Date: Mon, 19 Nov 2001 13:51:05 -0800

Yes, I meant to mention that this was seen under NFS.  And only on
solaris, though we do an equivalent amount of banging on NFS 
from Linux and Alpha clients.

Kevin




Howard Chu wrote:
> 
> I've seen this kind of problem before in other programs, but usually only on
> NFS-mounted filesystems. Generally on local UFS partitions the system calls
> are atomic. It would be simpler if we could use sigaction() and set the
> SA_RESTART flag for these signals, but the Solaris man pages don't mention
> stat() as being one of the restartable system calls. (But I'd bet that it
> is...)
> 
>   -- Howard Chu
>   Chief Architect, Symas Corp.       Director, Highland Sun
>   http://www.symas.com               http://highlandsun.com/hyc
>   Symas: Premier OpenSource Development and Support
> 
> > -----Original Message-----
> > From: address@hidden [mailto:address@hidden Behalf Of
> > Kevin Nomura
> > Sent: Monday, November 19, 2001 1:07 PM
> > To: address@hidden
> > Subject: make-3.79 on solaris8 broken
> >
> >
> > Using make-3.79 under solaris 6 and solaris 8, I have been seeing
> > two intermittent problems.  It seems to get worse with higher values
> > of -j.   One is "No rule to make target xxx" when there is, in fact,
> > a rule to make target xxx.  As befits an intermittent problem, the
> > make succeeds if rerun with no changes.
> >
> > The second problem is more insidious: make *quietly* fails to rebuild
> > some of its targets that are out of date.  The symptom is link errors
> > with unsat symbols owing to the incomplete build.  Again, rerunning
> > make picks these up and succeeds.  Since this is a chronic problem for
> > us I spent this past weekend debugging it with make -d and have some
> > theories to offer.
> >
> > The first problem seems due to the stat() in remake.c not being protected
> > by a retry loop for EINTR.  stat() on solaris is documented as failing
> > with EINTR.  So, I fixed this, actually implementing the "safe_stat()"
> > function that has a prototype in make.h but no definition (!?).  This
> > cleared up the "No rule" errors but not the unsat link problems.
> >
> > For the second problem with failed links, the -d trace surrounding one of
> > the files that should have been remade (but was not) looked like:
> >
> >         Considering target file `../netcache/server/obj/td/wccp2.o'.
> >          Looking for an implicit rule for
> > `../netcache/server/obj/td/wccp2.o'.
> >          Trying pattern rule with stem `wccp2'.
> >          Trying implicit prerequisite `../netcache/server/obj/td/wccp2.r'.
> > Got a SIGCHLD; 1 unreaped children.
> > Got a SIGCHLD; 2 unreaped children.
> >          Trying pattern rule with stem `wccp2'.
> >          Trying implicit prerequisite `../netcache/server/obj/td/wccp2.f'.
> >          Trying pattern rule with stem `wccp2'.
> >          Trying implicit prerequisite `../netcache/server/wccp2.c'.
> > Got a SIGCHLD; 3 unreaped children.
> >          Trying pattern rule with stem `wccp2'.
> >          Trying implicit prerequisite `../netcache/server/wccp2.cpp'.
> >          Trying pattern rule with stem `wccp2'.
> >          Trying implicit prerequisite `../netcache/server/wccp2.c'.
> >          Trying pattern rule with stem `wccp2'.
> >          Trying implicit prerequisite `../netcache/server/wccp2.c'.
> >          Trying pattern rule with stem `wccp2'.
> >          Trying implicit prerequisite `../netcache/server/obj/td/wccp2.c'.
> >          Trying pattern rule with stem `wccp2'.
> >          Trying implicit prerequisite
> > `../netcache/server/obj/td/wccp2.cc'.
> >          Trying pattern rule with stem `wccp2'.
> > ...
> >          No implicit rule found for `../netcache/server/obj/td/wccp2.o'.
> > ...
> >         No commands for `../netcache/server/obj/td/wccp2.o' and
> > no prerequisites
> >  actually changed.
> >         No need to remake target `../netcache/server/obj/td/wccp2.o'.
> >
> > Seeing that a signal happened right about the time it was checking
> > the prerequisite `../netcache/server/wccp2.c' (the source file, which
> > does exist), I zeroed in on the readdir() in
> > dir.c:dir_contents_file_exists_p().
> > Now, readdir() is not documented in solaris 6 or solaris 8 to
> > fail on EINTR.
> > But I put in a retry loop anyway and CAUGHT readdir failing on
> > EINTR, dozens
> > of times in the build in fact.  So with stat() and readdir() (and
> > opendir()
> > and some others for good measure) guarded by retry loops, the
> > problems have
> > now subsided.
> >
> > So assuming these are in fact the causes of the problems I saw, I am
> > wondering whether solaris is in error for returning EINTR (e.g. is this
> > broken with respect to POSIX or some standard that Solaris claims
> > adherence to)?  Should either or both of these be solved within make,
> > at least as a practical issue?
> >
> > Kevin Nomura
> > Network Appliance
> >
> > _______________________________________________
> > Bug-make mailing list
> > address@hidden
> > http://mail.gnu.org/mailman/listinfo/bug-make
> >



reply via email to

[Prev in Thread] Current Thread [Next in Thread]