help-make
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GNU Make on Linux Feeding All Commands Through ksh


From: Steve Waltner
Subject: Re: GNU Make on Linux Feeding All Commands Through ksh
Date: Thu, 4 Dec 2008 12:08:55 -0600

On Dec 4, 2008, at 11:28 AM, David Boyce wrote:

On Thu, Dec 4, 2008 at 11:05 AM, Steve Waltner <address@hidden> wrote:
After two months, I'm finally looking into this issue again. Gotta get it
working by the end of the year since migrating builds to Linux (more
specifically the faster x86 hardware) is one of my business objectives

Somewhat off topic: Solaris is now FOSS and runs on the same X86
hardware as Linux. Thus there may be good reasons to convert to Linux
but access to faster X86 hardware is not a sufficient one.

I presume you know this and have additional reasons for the switch but
wanted to point it out for the record/archives.

You are correct that going to Solaris x86 would be the better solution to get the performance gains of the x86 hardware and not deal with the compatibility issues between Linux and Solaris that I'm seeing. Unfortunately the toolset that we are using to build (VxWorks from WindRiver) is only available on Solaris SPARC, Linux x86, and Windows. Obviously, going to Windows would be a monumental undertaking with all the unix based scripts that are used during the build, so that wasn't considered. Going to Linux seemed like the easiest way to get the speed boost, but is proving a little bit of a problem. I had asked WindRiver about a Solaris x86 release of their software in the past. Maybe it's time to ping them again about this. It would have been better to ping them 6 weeks ago before we sent them a PO for licenses for the next four years though. "Port your software, and get the cash..." :-)

I do remember
the developer that did most of the work on the makefiles making the comment
about /bin/sh on Solaris being junk and switching to /bin/ksh.

That reasoning made sense on Solaris but may have a problem now, given
that you're moving to Linux, because /bin/ksh on Linux is *also* junk.
[snip] Fortunately Solaris has been bundling
bash for quite a long time, so perhaps the most robust and portable
arrangement for you would be to settle on SHELL=/bin/bash.

I'll investigate using bash (as well as CentOS and Ubuntu as mentioned by Galen) to see if it behaves any differently.

The main question that remains would be: Is there a way to debug and follow the token check-in/check-out process that is used internally in GNU make to try and see what's going on here? I can work on trying to track down what's going wrong, but without a way to get visibility into the process, I'd just be making random changes to the makefiles, which isn't going to be very
productive.

Sorry, can't help directly with your main problem since I haven't
worked much with make -j. Since you're building your own make anyway
it shouldn't be too hard to insert some debugging printfs. Or if you
want to be really aggressive you could build a Solaris 10 machine and
install Linux in a "zone" (semi-virtualization concept), then use
dtrace to track what's happening with the job server. Possibly even
strace would help on native Linux.

I don't remember if this was mentioned upthread but presumably you've
read http://make.paulandlesley.org/jobserver.html for background? If
not, probably a good idea.

Hmm... as I think about it, the whole jobserver technique depends on
downstream processes to leave those file descriptors open. If anybody
messes with the FD_CLOEXEC flag or closes them explicitly, you might
see the behavior described. I've seen programs that do something like

 for (i = 3; i < maxfds; i++) close(i);

before an exec, just for the heck of it. I've already mentioned that
pdksh is crap; I wonder if it's doing something like that? Wait, no,
you said you took /bin/ksh out and it still broke ... anyway, I'd try
strace or similar to see if the jobserver pipe's file descriptors are
being closed. Note that this is all based on a memory of the jobserver
document; I have not read it closely, lately.


I had read through the jobserver web page two years ago when we switched our builds from using "-j --max-load=4" to "-j 4" at the same time we moved the builds from running on the servers that everyone uses for their interactive jobs to a cluster of dedicated build servers. We did have several issues in the makefiles originally that needed to be fixed in regards to how make called itself recursively to run the build.

I'll do some testing with strace and possibly re-compiling GNU make with some printfs in there to see if that provides any insight. Your comment about something (possibly ksh) closing file handles may be exactly what's going on here. I let a "gmake -j 100" job run to completion on the Linux server. It too eventually degraded to a single- threaded build, but it took a lot longer than the "-j 32" builds I would normally run. This build exited with the following warning:

gmake: INTERNAL: Exiting with 1 jobserver tokens available; should be 100!


So, something is definitely interfering with the jobserver when the build is run on Linux and consuming tokens that should only be used by GNU make.

Thank you everyone for the detailed responses. I will some digging and let you know what I find out.

Steve




reply via email to

[Prev in Thread] Current Thread [Next in Thread]