[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CVS diff and unknown files.

From: Paul Sander
Subject: Re: CVS diff and unknown files.
Date: Sat, 29 Jan 2005 13:09:59 -0800

On Jan 28, 2005, at 8:50 AM, address@hidden wrote:

Paul Sander <address@hidden> writes:
On Jan 27, 2005, at 1:07 AM, address@hidden wrote:
Just to understand your point better, do you propose 'cvs add -c
new_file' and 'cvs ci new_file' run exactly the same set of triggers?
Different sets?

I think the consensus in the last iteration of the topic was that, if
add-time triggers were implemented, they would run as a client-side
option at add-time and and be obligatory at commit time, preceding
commit-time triggers. Doubling the overhead was the only way we came
up with at the time that guaranteed that the add-time triggers fired
at least once prior to the first commit of a new file.

Well, so the answer to my question is "the two trigger sets are
different". But it in turn means that it could be the case that I won't
be able to 'cvs ci new_file' even though 'cvs add -c new_file' just said
the file is OK to be added. IMHO that's a mess. And I believe this mess
is direct consequence of poor design choices you are advocating, sorry.

A commitinfo trigger can always refuse a file for which "cvs add" completed successfully. There's nothing new here. If you don't like that, then don't use commitinfo.

What's new is that we can cause "cvs add" to fail under certain conditions that we know in advance will cause commitinfo to refuse a file. To make sure that the condition is checked, we do it a second time before commitinfo, because some users insist on defeating it the first time.

Another point that I would like to make, however, is that CVS can
become more feature-rich, support multiple policies, and have a
simpler user interface (in most use cases, at least) all at the same
time. The problem is that most of us here are too close to the
implementation to take a fresh view of the problem.

In the core CVS program the "cvs add" and "cvs remove" operations
must be fixed to be equivalent to "vi file.c" -- i.e. operations
which _ONLY_ affect the local working directory and which _NEVER_
contact the server.

See above. If there are no add-time triggers, then I can live with
what you say. On the other hand, some shops REQUIRE add-time triggers,
and if add-time triggers are used then contacting the server is
REQUIRED to make them run.

Sorry, add-time with respect to what? Add-time w.r.t. to working copy is entirely different from the add-time w.r.t. repository. Do you realize currently there is no command but 'cvs ci' meaning "add this file to the repository"? Add time w.r.t. repository currently happens when you 'cvs
commit' the new file. Do you propose to change this?

What I mean by "add-time" is the moment in which the user invokes the "cvs add" command. The canonical example of such a trigger is one that enforces naming conventions, but there are other reasons to control what the user can add. (Some shops don't want users creating directories, for example, and
require the CM team to do it for them.)

So, provided that "cvs add" adds the file to the working copy, your
definition means "add-time to working copy". Requiring server triggers
to be run at add-time to working copy seems rather strange design choice
in my opinion.

Can you suggest another way to enforce add-time policy in a way that's centrally controlled by the CVS admin?

When "cvs add" runs, I don't care whether or not the CVS server
modifies the repository. Some people seem to think that running "cvs
add" while disconnected is a useful thing to do. (I take the attitude
that client/server applications shouldn't be expected to run without a
working network, so I'd never run any CVS command when I was unable to
connect to the server.)

Very strange attitude. And a very unusual definition of expectations
associated with client/server applications. Having this definition, mail
client isn't supposed to let you do anything with the mail on your
computer without a working network?!

I rely heavily on IMAP and SMTP for mail transport. At work, my mail client lives on an NFS server. So no, I don't expect mail to work when the network is down.

BTW, the definition of "client/server" implies the presence of a network for communication between the two parts, without which the application fails. The other method is usually called "monolithic", in which the application is self-contained. CVS works both ways, but most of us prefer a remote repository, hence the client/server implementation.

However, since triggers enforce policy and must not be defeatable,
e.g. by changing one's path or bypassing a wrapper or hacking the
client machine, they're best implemented as a server feature. It's for
that reason, assuming add-time triggers are implemented, they require
a connection to the server.

The "add-time triggers" you are advocating would then create an
unfortunate precedent of preventing user from modifying his working copy
at a will of CVS repository administrator! PLEASE, DON'T DO IT TO CVS!
Sorry for shouting. Fortunately, as triggers are run on the server,
there is in fact no way to actually impose any policy on the client

The CVS administrator guards what enters the repository. Users typically don't do things that they know will be refused by the repository; it's a waste of time and effort. The add-time triggers simply bring to the attention of the users earlier in the cycle that their code will be refused later anyway.

What I'm seeing is a knee-jerk reaction of "ooooh, he wants to limit what I can do in my own workspace!" or "he wants to change the behavior of a command that I use!". But think about what's really going on. If the work will be refused anyway, why would you possibly want to deliberately proceed toward a dead end? If you really, really want to go that route, the add-time test is optional anyway.

What you do privately in your workspace isn't my concern. It's just that once you make CVS aware of the file, you must become aware that certain policies begin to apply. I find it more efficient to detect violations early because history has shown that problems are easier, faster, and cheaper to fix when found earlier than when found later.

In fact the semantics of proposed 'cvs add -c' is: "add this file to the
working copy and check if the repository will allow me to commit this
new file if I decide to". In this semantic only the latter part has
anything to do with the repository, and it is "commit", not "add",
that's why I suggested 'cvs commit' would be more logical place to check
for such things.

Certain conditions that "will allow me to commit this new file if I
decide to" can be checked at the time the user invokes the "cvs add"
command. The rationale is that if a failure condition can be detected
at add-time then any conditions deriving from those creating the
failure condition can be halted, thus avoiding costly recovery action
at the time when the first commit actually fails.

The problem is that you still try at add-to-the-working-copy time to
impose policies that in fact make sense at the add-to-repository time.

There's nothing I can say here that I haven't said before: The commit will fail anyway, so why not find out early and fix it while it's still easy?

For example, suppose a mixed Windows/Unix shop requires all files to
have upper-case 8.3 file names. A new programmer splits a header file
and creates a new foo.h. He adds the file, then proceeds to modify all
of the source files to include the new foo.h. Then he updates all of
the dependencies in his makefiles. He builds and tests on Unix. He
types "cvs commit", types a very detailed log of his actions, and
finally punches the screen. This person might have saved a day's work,
his equipment, and his knuckles if only "cvs add" had said "Sorry,
this new file violates our naming policy, try renaming it to FOO.H

... and what? Is the file added to the working copy or not? If not, then
if I've specified a few files, are they all not added, or only those
that didn't pass the test? Still I'm opposed to the idea that "cvs add"
would refrain from adding files to the working copy. It's not repository
administrator business to decide what I can and what I can't do to my
working copy. If "cvs add" will only warn about the problems, -- that's
OK with me as a user.

Seriously, think about what you're saying. You want to do the following, dramatically oversimplied:

edit foo.h
cvs add foo.h
edit foo.h
cc FOO.C
cvs commit foo.h

It's at this point you want to fail, after you've done all the work. In my opinion it's better all around you fail after two steps, not five.

Sure, you can change your behavior and move the add down to right before the commit, and that is consistent with the working styles of many people, but then they get what they ask for.

However, I still believe "cvs add" is a wrong place to implementing the
functionality from the design point of view. I can only repeat my
argument: as "cvs ci new_file" will do the checks anyway, that's what
"cvs -n ci new_file" should do, not "cvs add". The same opportunities
for the user, but cleaner and simple design.

But it's a lousy user model because it allows the user to proceed a long way down a dead end before turning him around.

To get your semantics, it seems you need a new operation with the
semantics "add the file to the working copy and to the repository, but don't give it to anybody on 'cvs update' yet, until I latter commit the
addition". Do you propose exactly this? How could it be done without
write access to the repository?

The semantics I want are to validate the addition of a new artifact. I frankly don't care if "cvs add" is implemented as you describe in that last
paragraph or if it's implemented Greg's way.

Maybe you don't care, but implementing a messy design will eventually
result in a mess for end-user. Then you will probably care.

The change to the CVS design really isn't significant. It can already contact the server, and it already implements server-side triggers. Adding a new one really has no significant impact to the quality of its implementation. And as I've demonstrated above, it actually improves the user model.

No matter which method records the new file, the client must still
contact the server to run the add-time trigger.

Sure, the only difference is that you require add-to-repository trigger
to be run at add-time-to-working-copy time as well, and that's IMHO a
wrong design decision.

Okay, use the option to disable the add-time trigger. Or, don't use add-time triggers at all.

I had hoped that this was clear in the last go-round, but apparently

For me it is not, sorry. Let me give yet another example. Suppose I've
created a new_file today and have checked it's ok using proposed 'cvs
add -c new_file' command. Two days later what I've checked could already

be wrong (policy change, another user added the same file, etc.). So
there are two questions:

1. How do I repeat my check later? By just repeating 'cvs add -c
   new_file'? This would produce warnings "new_file has already been
   added" that is not a good thing for an operation intended to make
   checks, I'm afraid.

2. Should 'cvs -n ci new_file' run the same triggers 'cvs add -c
   new_file' runs? If exactly the same, then why the duplication?

My vision is that the user would run "cvs ci new_file" without the -n

If this is supposed to be an answer to my first question, then it is not
in fact an acceptable answer. I asked "how do I repeat the check?" and
you've answered "by committing your changes to the repository". Should I
take it as "no, you will have no easy way to repeat the check later"?

Okay, here's a different tack: *You* don't perform the check. CVS performs the check on your behalf. When you run the "cvs add" command, what would happen below the covers is this:

Run add-time trigger
If trigger succeeds, create entry in Entries file

When you invoke "cvs commit" at a later time, CVS performs the following steps:

Run add-time trigger
If add-time trigger succeeds, run commit-time trigger
If both triggers succeed, create new version in repository and update Entries

It would run the add-time triggers (for the first commit) and then the
commit-time triggers.

Will "cvs -n ci new_file" run the triggers in the design you have in

It would do exactly what "cvs commit" would do without the -n option, except that it would not modify any persistent state in the repository or in the workspace.

The reason for the duplication is partly to catch possible changes in
policy between add-time and commit-time, and partly to avoid abuses by
those who unplug their computers as part of their procedure to add new
files (which includes those who insist that running "cvs add" while
disconnected is a reasonable thing to do).

Running "cvs add" is indeed reasonable thing to do no matter if
connected or not. When I'm sitting on another end of our planet from the server running the repository I don't wish to depend on the existence of
the link between my computer and the server when I decide the file
should be added to the project. The more I can do offline, -- the
better, -- ability to do things independently is one of the properties
of distributed systems that make them superior in many cases.

Well, there's also the founding philosophy of the whole client/server revolution, which is that data islands are bad.

But I also recognize that, if you must work offline, it's better to have "cvs add" succeed in isolation than to fail (so that the user does not need to remember over a long period of time to add the file next time he connects). It's for this reason that the add-time trigger is optional when "cvs add" runs, but is mandatory at commit time.

Whatever the reasons for duplication are, the fact is that the server
must run add-time triggers when new file is being added to the
repository. Adding files to repository is what "cvs commit" does. Then
it's logical for "cvs -n commit" to run the triggers as well. Then,
provided you already have a way to run these add-time triggers through
"cvs -n commit", why do you need yet another way to perform the same
operation (through "cvs add")? Thus, the duplication is not in fact
required, -- it's the wrong design choice you are advocating that leads
to the duplication.

The reason to have "cvs add" do it is because it saves a step and makes a single action (from the user's point of view) whole. There should be no reason in the world why the user must invoke two commands to complete a single action, to add a file in this case.

Policy goes above -- it is not hard-coded in the core functionality.

Agreed. But the tool must be sufficiently flexible to allow robust
implementations of policies. Sometimes triggers are the right way (and wrapper scripts are not), and we've identified one area here where CVS
is not sufficiently flexible.

That's why I insist CVS should have sane set of elementary operations
that could be then combined in different ways. I have no objection
against "compound" CVS commands that do multiple things (preferably
making them atomic), but existing of corresponding elementary commands is a must, I believe. Designing tools in a different manner results in a
lack of flexibility, or at least my experience suggests it does.

Indeed.  To me, the ideal implementation would be a collection of very
primitive operations glued together by a scripting language, and have the
command line interface invoke scripts.

... and then you suggest "cvs add" semantics that instead of doing one
thing, adding files to the working copy, will in fact do two things, --
adding files to the working copy and checking if the commitment of these
files later would be OK. It seems you finally contradict to your own
goals, sorry.

Not a bit. Logical primitives and user operations are two different things. Creating a new revision in the repository is a logical primitive. Invoking a trigger is logical primitive. Updating the Entries file is a logical primitive. The "cvs commit" command is a user operation that does all of these things. There's a big difference in these concepts, and I think that they should be kept separated. And I don't think that every logical primitive should necessarily be exposed to the user, though APIs to logical primitives are useful in building new user operations.

Don't you read your own signatures by the way?: "To do two things at
once is to do neither".

In 100 B.C., the Roman philosopher Publilius Syrus was writing about human multitasking (which is kind of an oxymoron because people are not very good at thinking about two things at once). I'm pretty certain that he was not thinking about adding steps to a sequence that's hidden beneath a user interface.

I read the other one, too, which appears at the bottom of this message. In this case, the "old mistake" is neglecting to detect failure conditions early enough. Let's just get past this one and move on to the next one. File renaming is always a popular topic. (Note tongue planted firmly in cheek.)

I don't believe that the existing CVS command line offers enough
flexibility, in a number of ways. And an implementation like the one I
describe here cannot be built without trashing most of the existing
code. But that's a different discussion...

If what you have in mind will trash most of the existing code, then
it's obviously better to take a fresh start and design the tools from
scratch. But that won't be the CVS anymore, I'm afraid.

Yeah, and I'm not recommending (at this time) that CVS be rewritten as a library with an embedded scripting language. I'm just recommending that add-time triggers be added. The latter is easy, it's a huge win for those who need it, and has zero impact on those who don't.

Paul Sander | "Lets stick to the new mistakes and get rid of the old
address@hidden | ones" -- William Brown

reply via email to

[Prev in Thread] Current Thread [Next in Thread]