Re: CVS diff and unknown files.

info-cvs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CVS diff and unknown files.

From:	Paul Sander
Subject:	Re: CVS diff and unknown files.
Date:	Sat, 29 Jan 2005 13:09:59 -0800


On Jan 28, 2005, at 8:50 AM, address@hidden wrote:

Paul Sander <address@hidden> writes:

On Jan 27, 2005, at 1:07 AM, address@hidden wrote:

[...]

Just to understand your point better, do you propose 'cvs add -c
new_file' and 'cvs ci new_file' run exactly the same set of triggers?
Different sets?


I think the consensus in the last iteration of the topic was that, if
add-time triggers were implemented, they would run as a client-side
option at add-time and and be obligatory at commit time, preceding
commit-time triggers. Doubling the overhead was the only way we came
up with at the time that guaranteed that the add-time triggers fired
at least once prior to the first commit of a new file.


Well, so the answer to my question is "the two trigger sets are
different". But it in turn means that it could be the case that I won't

be able to 'cvs ci new_file' even though 'cvs add -c new_file' justsaid

the file is OK to be added. IMHO that's a mess. And I believe this mess
is direct consequence of poor design choices you are advocating, sorry.

A commitinfo trigger can always refuse a file for which "cvs add"completed successfully. There's nothing new here. If you don't likethat, then don't use commitinfo.

What's new is that we can cause "cvs add" to fail under certainconditions that we know in advance will cause commitinfo to refuse afile. To make sure that the condition is checked, we do it a secondtime before commitinfo, because some users insist on defeating it thefirst time.

Another point that I would like to make, however, is that CVS can
become more feature-rich, support multiple policies, and have a
simpler user interface (in most use cases, at least) all at the same
time. The problem is that most of us here are too close to the
implementation to take a fresh view of the problem.
In the core CVS program the "cvs add" and "cvs remove" operations
must be fixed to be equivalent to "vi file.c" -- i.e. operations
which _ONLY_ affect the local working directory and which _NEVER_
contact the server.
See above. If there are no add-time triggers, then I can live with
what you say. On the other hand, some shops REQUIRE add-timetriggers,
and if add-time triggers are used then contacting the server is
REQUIRED to make them run.
Sorry, add-time with respect to what? Add-time w.r.t. to workingcopy isentirely different from the add-time w.r.t. repository. Do yourealizecurrently there is no command but 'cvs ci' meaning "add this file totherepository"? Add time w.r.t. repository currently happens when you'cvs
commit' the new file. Do you propose to change this?
What I mean by "add-time" is the moment in which the user invokes the"cvsadd" command. The canonical example of such a trigger is one thatenforcesnaming conventions, but there are other reasons to control what theuser canadd. (Some shops don't want users creating directories, for example,and
require the CM team to do it for them.)
So, provided that "cvs add" adds the file to the working copy, your
definition means "add-time to working copy". Requiring server triggers
to be run at add-time to working copy seems rather strange designchoice
in my opinion.

Can you suggest another way to enforce add-time policy in a way that'scentrally controlled by the CVS admin?

When "cvs add" runs, I don't care whether or not the CVS server
modifies the repository. Some people seem to think that running "cvs
add" while disconnected is a useful thing to do. (I take the attitude
that client/server applications shouldn't be expected to run without a
working network, so I'd never run any CVS command when I was unable to
connect to the server.)


Very strange attitude. And a very unusual definition of expectations

associated with client/server applications. Having this definition,mail

client isn't supposed to let you do anything with the mail on your
computer without a working network?!

I rely heavily on IMAP and SMTP for mail transport. At work, my mailclient lives on an NFS server. So no, I don't expect mail to work whenthe network is down.

BTW, the definition of "client/server" implies the presence of anetwork for communication between the two parts, without which theapplication fails. The other method is usually called "monolithic", inwhich the application is self-contained. CVS works both ways, but mostof us prefer a remote repository, hence the client/serverimplementation.

However, since triggers enforce policy and must not be defeatable,
e.g. by changing one's path or bypassing a wrapper or hacking the
client machine, they're best implemented as a server feature. It's for
that reason, assuming add-time triggers are implemented, they require
a connection to the server.


The "add-time triggers" you are advocating would then create an

unfortunate precedent of preventing user from modifying his workingcopy

at a will of CVS repository administrator! PLEASE, DON'T DO IT TO CVS!
Sorry for shouting. Fortunately, as triggers are run on the server,
there is in fact no way to actually impose any policy on the client
side.

The CVS administrator guards what enters the repository. Userstypically don't do things that they know will be refused by therepository; it's a waste of time and effort. The add-time triggerssimply bring to the attention of the users earlier in the cycle thattheir code will be refused later anyway.

What I'm seeing is a knee-jerk reaction of "ooooh, he wants to limitwhat I can do in my own workspace!" or "he wants to change the behaviorof a command that I use!". But think about what's really going on. Ifthe work will be refused anyway, why would you possibly want todeliberately proceed toward a dead end? If you really, really want togo that route, the add-time test is optional anyway.

What you do privately in your workspace isn't my concern. It's justthat once you make CVS aware of the file, you must become aware thatcertain policies begin to apply. I find it more efficient to detectviolations early because history has shown that problems are easier,faster, and cheaper to fix when found earlier than when found later.

In fact the semantics of proposed 'cvs add -c' is: "add this file tothe
working copy and check if the repository will allow me to commit this
new file if I decide to". In this semantic only the latter part has
anything to do with the repository, and it is "commit", not "add",
that's why I suggested 'cvs commit' would be more logical place tocheck
for such things.


Certain conditions that "will allow me to commit this new file if I
decide to" can be checked at the time the user invokes the "cvs add"
command. The rationale is that if a failure condition can be detected
at add-time then any conditions deriving from those creating the
failure condition can be halted, thus avoiding costly recovery action
at the time when the first commit actually fails.


The problem is that you still try at add-to-the-working-copy time to
impose policies that in fact make sense at the add-to-repository time.

There's nothing I can say here that I haven't said before: The commitwill fail anyway, so why not find out early and fix it while it's stilleasy?

For example, suppose a mixed Windows/Unix shop requires all files to
have upper-case 8.3 file names. A new programmer splits a header file
and creates a new foo.h. He adds the file, then proceeds to modify all
of the source files to include the new foo.h. Then he updates all of
the dependencies in his makefiles. He builds and tests on Unix. He
types "cvs commit", types a very detailed log of his actions, and
finally punches the screen. This person might have saved a day's work,
his equipment, and his knuckles if only "cvs add" had said "Sorry,
this new file violates our naming policy, try renaming it to FOO.H
instead."

... and what? Is the file added to the working copy or not? If not,then

if I've specified a few files, are they all not added, or only those
that didn't pass the test? Still I'm opposed to the idea that "cvs add"

would refrain from adding files to the working copy. It's notrepository

administrator business to decide what I can and what I can't do to my
working copy. If "cvs add" will only warn about the problems, -- that's
OK with me as a user.

Seriously, think about what you're saying. You want to do thefollowing, dramatically oversimplied:


edit foo.h
cvs add foo.h
edit foo.h
cc FOO.C
cvs commit foo.h

It's at this point you want to fail, after you've done all the work.In my opinion it's better all around you fail after two steps, notfive.

Sure, you can change your behavior and move the add down to rightbefore the commit, and that is consistent with the working styles ofmany people, but then they get what they ask for.

However, I still believe "cvs add" is a wrong place to implementing the
functionality from the design point of view. I can only repeat my
argument: as "cvs ci new_file" will do the checks anyway, that's what
"cvs -n ci new_file" should do, not "cvs add". The same opportunities
for the user, but cleaner and simple design.

But it's a lousy user model because it allows the user to proceed along way down a dead end before turning him around.

To get your semantics, it seems you need a new operation with the
semantics "add the file to the working copy and to the repository,butdon't give it to anybody on 'cvs update' yet, until I latter committhe
addition". Do you propose exactly this? How could it be done without
write access to the repository?
The semantics I want are to validate the addition of a new artifact.Ifrankly don't care if "cvs add" is implemented as you describe inthat last
paragraph or if it's implemented Greg's way.
Maybe you don't care, but implementing a messy design will eventually
result in a mess for end-user. Then you will probably care.

The change to the CVS design really isn't significant. It can alreadycontact the server, and it already implements server-side triggers.Adding a new one really has no significant impact to the quality of itsimplementation. And as I've demonstrated above, it actually improvesthe user model.

No matter which method records the new file, the client must still
contact the server to run the add-time trigger.


Sure, the only difference is that you require add-to-repository trigger
to be run at add-time-to-working-copy time as well, and that's IMHO a
wrong design decision.

Okay, use the option to disable the add-time trigger. Or, don't useadd-time triggers at all.

I had hoped that this was clear in the last go-round, but apparently
not.
For me it is not, sorry. Let me give yet another example. SupposeI've
created a new_file today and have checked it's ok using proposed 'cvs
add -c new_file' command. Two days later what I've checked couldalready

be wrong (policy change, another user added the same file, etc.). So
there are two questions:

1. How do I repeat my check later? By just repeating 'cvs add -c
   new_file'? This would produce warnings "new_file has already been
   added" that is not a good thing for an operation intended to make
   checks, I'm afraid.

2. Should 'cvs -n ci new_file' run the same triggers 'cvs add -c
   new_file' runs? If exactly the same, then why the duplication?


My vision is that the user would run "cvs ci new_file" without the -n
option.

If this is supposed to be an answer to my first question, then it isnot

in fact an acceptable answer. I asked "how do I repeat the check?" and

you've answered "by committing your changes to the repository". ShouldI

take it as "no, you will have no easy way to repeat the check later"?

Okay, here's a different tack: *You* don't perform the check. CVSperforms the check on your behalf. When you run the "cvs add"command, what would happen below the covers is this:


Run add-time trigger
If trigger succeeds, create entry in Entries file

When you invoke "cvs commit" at a later time, CVS performs thefollowing steps:


Run add-time trigger
If add-time trigger succeeds, run commit-time trigger

If both triggers succeed, create new version in repository and updateEntries

It would run the add-time triggers (for the first commit) and then the
commit-time triggers.


Will "cvs -n ci new_file" run the triggers in the design you have in
mind?

It would do exactly what "cvs commit" would do without the -n option,except that it would not modify any persistent state in the repositoryor in the workspace.

The reason for the duplication is partly to catch possible changes in
policy between add-time and commit-time, and partly to avoid abuses by
those who unplug their computers as part of their procedure to add new
files (which includes those who insist that running "cvs add" while
disconnected is a reasonable thing to do).


Running "cvs add" is indeed reasonable thing to do no matter if

connected or not. When I'm sitting on another end of our planet fromtheserver running the repository I don't wish to depend on the existenceof

the link between my computer and the server when I decide the file
should be added to the project. The more I can do offline, -- the
better, -- ability to do things independently is one of the properties
of distributed systems that make them superior in many cases.

Well, there's also the founding philosophy of the whole client/serverrevolution, which is that data islands are bad.

But I also recognize that, if you must work offline, it's better tohave "cvs add" succeed in isolation than to fail (so that the user doesnot need to remember over a long period of time to add the file nexttime he connects). It's for this reason that the add-time trigger isoptional when "cvs add" runs, but is mandatory at commit time.

Whatever the reasons for duplication are, the fact is that the server
must run add-time triggers when new file is being added to the
repository. Adding files to repository is what "cvs commit" does. Then
it's logical for "cvs -n commit" to run the triggers as well. Then,
provided you already have a way to run these add-time triggers through
"cvs -n commit", why do you need yet another way to perform the same
operation (through "cvs add")? Thus, the duplication is not in fact
required, -- it's the wrong design choice you are advocating that leads
to the duplication.

The reason to have "cvs add" do it is because it saves a step and makesa single action (from the user's point of view) whole. There should beno reason in the world why the user must invoke two commands tocomplete a single action, to add a file in this case.

Policy goes above -- it is not hard-coded in the corefunctionality.
Agreed. But the tool must be sufficiently flexible to allow robust
implementations of policies. Sometimes triggers are the right way(andwrapper scripts are not), and we've identified one area here whereCVS
is not sufficiently flexible.
That's why I insist CVS should have sane set of elementary operations
that could be then combined in different ways. I have no objection
against "compound" CVS commands that do multiple things (preferably
making them atomic), but existing of corresponding elementarycommandsis a must, I believe. Designing tools in a different manner resultsin a
lack of flexibility, or at least my experience suggests it does.
Indeed.  To me, the ideal implementation would be a collection of very
primitive operations glued together by a scripting language, and havethe
command line interface invoke scripts.
... and then you suggest "cvs add" semantics that instead of doing one
thing, adding files to the working copy, will in fact do two things, --
adding files to the working copy and checking if the commitment ofthese
files later would be OK. It seems you finally contradict to your own
goals, sorry.

Not a bit. Logical primitives and user operations are two differentthings. Creating a new revision in the repository is a logicalprimitive. Invoking a trigger is logical primitive. Updating theEntries file is a logical primitive. The "cvs commit" command is auser operation that does all of these things. There's a big differencein these concepts, and I think that they should be kept separated. AndI don't think that every logical primitive should necessarily beexposed to the user, though APIs to logical primitives are useful inbuilding new user operations.

Don't you read your own signatures by the way?: "To do two things at
once is to do neither".

In 100 B.C., the Roman philosopher Publilius Syrus was writing abouthuman multitasking (which is kind of an oxymoron because people are notvery good at thinking about two things at once). I'm pretty certainthat he was not thinking about adding steps to a sequence that's hiddenbeneath a user interface.

I read the other one, too, which appears at the bottom of this message.In this case, the "old mistake" is neglecting to detect failureconditions early enough. Let's just get past this one and move on tothe next one. File renaming is always a popular topic. (Note tongueplanted firmly in cheek.)

I don't believe that the existing CVS command line offers enough
flexibility, in a number of ways. And an implementation like the one I
describe here cannot be built without trashing most of the existing
code. But that's a different discussion...


If what you have in mind will trash most of the existing code, then
it's obviously better to take a fresh start and design the tools from
scratch. But that won't be the CVS anymore, I'm afraid.

Yeah, and I'm not recommending (at this time) that CVS be rewritten asa library with an embedded scripting language. I'm just recommendingthat add-time triggers be added. The latter is easy, it's a huge winfor those who need it, and has zero impact on those who don't.

--

Paul Sander | "Lets stick to the new mistakes and get rid of theold

address@hidden | ones" -- William Brown

[Prev in Thread]

Current Thread

[Next in Thread]

Re: CVS diff and unknown files., (continued)

Prev by Date: Re: how to branch libraries (and avoid excessive space-consuming)
Next by Date: Re: CVS diff and unknown files.
Previous by thread: Re: CVS diff and unknown files.
Next by thread: Re: CVS diff and unknown files.
Index(es):
- Date
- Thread