make-alpha
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: GSoC: OOD detection


From: David Boyce
Subject: Re: GSoC: OOD detection
Date: Sun, 29 Apr 2007 19:09:17 -0400

At 02:54 PM 4/14/2007, Paul Smith wrote:
Hi all; sorry this is a bit slow.  Soccer season is starting and I'm
very busy!  This email will be a bit of a brain dump so please bear with
me.

Likewise. Busy, distracted, skimming this thread where I should be studying it, etc....

Before anything else, it's important to realize that there are two
distinct, yet interdependent capabilities we are discussing here: the
first is the ability to use a separate algorithm for OOD determination,
and that's the one we've been talking about so far.  But the second one
is at least as important and, in my opinion, the more challenging
design-wise: that is stateful make; the ability to keep some state
information across invocations of make.  I don't think there are too
many OOD algorithms that you can choose that wouldn't require persistent
state.  Deciding how to store that state, especially when you don't
really know what format it will be in (obviously the state to be stored
will vary with the OOD algorithm chosen), provide it to the OOD
algorithm, etc. is a design challenge.

I should preface my remarks by acknowledging that many of you, certainly Paul, have been thinking about make much longer and harder than I have so I may well be misunderstanding or oversimplifying some issues. But I hope you'll at least consider my argument that it doesn't need to be as complicated as this.

The way I've always imagined this is that make would deliberately *not* address the problem of persistence. Instead it would defer that to the particular OOD override, which at least has the virtue of laziness (pace Larry Wall). Let's start by considering the "competition"; ClearCase is certainly the best-known, most widely deployed tool which currently offers advanced OOD detection and it keeps its persistent data in a network database. So let's say I or someone else wants to implement full ClearCase-like functionality using GNU make. A database may well be preferred to sentinel files in such an environment. In fact if we want to be able to share our stateful knowledge with someone not operating in the exact same file tree, that may be necessary.

In fact let's take this to its logical conclusion: what if one could reverse-engineer the CC network protocol and wanted to tap into its database for OOD decisions? I have no plan (or hope) of doing this but it's not inconceivable that the vendor might contribute an implementation, and in any case it serves to illustrate the point that make need not handle its own persistence.

Let's consider a possible design off the top of my head. Say we define a struct containing all potentially pertinent data for OOD decisions (using short names for now because I'm a bad typist):

typedef struct ood {
        int ood_version;
        char *ood_targets[];            // vector of paths to targets
        char *ood_prereqs[];            // vector of paths to prereqs
        char *ood_envp[];               // traditional environ vector
        char *ood_cwd;                  // current working directory
        char *ood_script;               // the build script
};

That's all I can think of which would affect a go/no-go decision but by using a struct and storing ood_version we give it a faint OO gloss which would allow for extensibility in case something else turns up. Make alway calls the ood() function and passes it the above struct whenever such a determination needs to be made. The default algorithm would be to simply compare the dates of the targets to those of the prereqs, in which case state is handled for us as it always has been. If an override algorithm is detected, the override function not only has all the information it needs for the OOD decision, it has enough data to store that state too[*], which it could store in a file or by writing to a socket or whatever.

[*] I see the first flaw already, which is that as Paul said the state must be stored *after* the build script is run while the OOD decision is made before. So this basically means you'd need ood_pre() and an ood_post() functions.

To my mind dependence on sentinel/stamp files is at least arguably a hack and I'd prefer that the design didn't require them. I also think anything which keeps the core of GNU make simpler is a good thing. Of course pushing persistence off to the user might make the implementation of these extensions a little more complex but you could deal with that by taking the same code you'd use for storing file-based state and stick it into a documented library instead of linking it into the make program.

Expanding on a previous point: it seems impossible to encode details such as "connect to port 9382 on machine foobar and send the names and MD5 states of the prereqs down the wire, then let me know what answer comes back" in a make variable like .OUT_OF_DATE. You'd basically be forced to write a little client program to do so and run it with $(shell) which would lead to performance issues, especially on Windows which is not optimized for quick cheap process creation. OTOH I do see that the ability to use target-specific settings would be quite elegant.

So to sum up: my argument is that OOD is conceptually pretty simple: (1) find all the places where datestamp comparison is done now and bring them all through one API, and (2) come up with a way for that API to be interposed. Am I missing something important?

David B

PS Both my model and yours would appear to suffer from an obvious race condition; what if something happens to change one or more prereqs between the "pre" moment (when OOD determination is made) and the "post" moment (when state is stored), either as the result of a badly designed build script or what ClearCase calls "interference from another process"? It seems some transitional state must be stored within the make process. Maybe building on your idea, the ood_pre() function could return a char pointer which would be null if the target is up to date and otherwise a valid string. This string is then passed into the ood_post() call for it to use as desired. The typical use would be to remember size/date/MD5 of the prereqs from the pre call and check that they're unchanged in the post.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]