[Monotone-devel] Re: long RFC: "contexts"

monotone-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: long RFC: "contexts"

From:	Jerome Fisher
Subject:	[Monotone-devel] Re: long RFC: "contexts"
Date:	Thu, 27 May 2004 07:48:55 +0200
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040514

graydon hoare wrote:

the idea -- in case you missed it in all the other VC systems! --would be to add a textual object to monotone which describes (all atonce) the contents of a number of certs and a certain amount ofcurrently-synthetic information:

What do you mean by "currently-synthetic information"? I think you'rereferring to storing the changes to each parent, which currently will bederived from manifest comparison and file rename certs. However, I'dlike to be sure I'm not misinterpreting this.

manifest: <manifest-sha1>
date: <contents-of-current-date-cert>
author: <contents-of-current-author-cert>
summary: "line of text"
parent: <first-parent-context-sha1> {
  manifest: <manifest-sha1>
  renames: [<filename>, <filename>]
  adds: [<filename>, <file-sha1>] ...
  dels: <filename> ...
  patches: [<filename> <file-sha1> <file-sha1>] ...
}
parent: <second-parent-context-sha1> {
  manifest: <manifest-sha1>
  renames: [<filename>, <filename>] ...
  adds: [<filename>, <file-sha1>] ...
  dels: <filename> ...
  patches: [<filename> <file-sha1> <file-sha1>] ...
}

remainder is changelog
^D

I have a few problems with this specific textual representation of acontext (commas, braces, etc.), and the names of some elements, but Idon't think that needs to be discussed yet. I think your main aim was inshowing what information would be included, anyway.

I think that getting the definition of a context right the first time isquite important. Context definitions and IDs are going to be sopervasively used that it will be very difficult to change them in futurewithout great disturbance. I think it's best to keep only essentialinformation, and eliminate - as much as is practical - that which doesnot directly relate to the primary goals. I consider these goals to be:


- Uniquely identifying a location in the history DAG.
- Allowing the associated changes to be accurately determined.
- Allowing the resultant state to be determined.


Essential properties, as I see it:

(1) Referencing each parent context.

- In the case of merges, this partially addresses the question ofhow the new state was reached.- It has the effect that contexts with different ancestry will havedifferent IDs, which is more or less essential for reasons that havebeen covered in other mails.

(2) Specifying, for each parent context, whatever changes were performedto get from its state to the new state that CAN'T be derived by simplycomparing those states.

   (currently only renames)

- This provides a partial set of changes between states. Extrainformation regarding these types of changes would otherwise be lost.- It has the effect that changes to the same parent(s) resulting inthe same new state but produced in different ways will result indifferent context IDs. This is almost certainly a good thing.

   - See "EXPLICIT CHANGES" below.

(3a) Specifying, for each parent context, whatever changes wereperformed to get from its state to the new state that CAN be derived bysimply comparing those states.

   (currently adds, dels and patches)
   - This allows the full set of changes to be known immediately.

- It's redundant if you can determine every parent's state and thenew state.- You never have to go through the expense of working out thechanges through state comparison. This speeds up operations like netsyncand log.


OR

(3b) Referencing the absolute representation (manifest) of the new state.
   - This allows the new state to be to be known immediately.

- It's redundant if you have full knowledge of the changes and candetermine the state of one parent (if there are any parents).- You never have to go through the expense of applying the changesto a parent state to determine the new one.- It allows for the stripping of old contexts, manifests and filedata to save space.



Additional properties in your proposal:

(4) Specifying the author of the change, the author's idea of time whenmaking the change, the author's summary of the change, and the author'sfull description of the change.- This has the effect that exactly the same changes to the sameparent(s) will result in multiple nodes in the history DAG if any ofthese attributes differ. This will happen quite often (especially withpeople auto-merging), and I consider this to be unnecessary and probablybad. The "badness" suspicion is mostly gut feeling, but I'm thinkingabout being able to correct, append to or enhance this change metadatalater - this shouldn't have to use a completely different system likecerts, and certainly shouldn't result in a change of context ID.

(5) Referencing, for each parent context, the absolute representation(manifest) of its state.- I don't see how this is useful at all unless the parent's contextis stripped or not yet downloaded, and then what do you want with themanifest ID? I think I'm missing something (I have no clue about theinternals of netsync, or anything else in monotone for that matter).

Only one of (3a) and (3b) is strictly necessary. As each provides veryimportant benefits, I think they should both remain.

Unless there's a good reason to have them that I'm not aware of, I think(4) and (5) are unnecessary, and in the case of (4) possibly evil.


So I would suggest:
- Removing the "manifest" field from the "parent" sections.

- Removing the "date", "author" and "summary" fields, and the changelogarea.- Attaching the "date", "author", "summary" and "changelog" informationto the context independently (using certs).



EXPLICIT CHANGES

I think it's important to note that it's highly desirable to store aswell as possible the changes that _were actually_ performed, not merelychanges that _can be_ performed to get from one state to another. It's asubtle but important distinction. The only place where we currentlyrecognise this is in the support of "rename". It would be possible todefine rename in terms of "add" and "delete", but we would then loseimportant information on what the author of the change actually did.


In future, for example, we might have:

replaces: [<filename>, <file-sha1>] ...

for completely replacing a file (meaning that the files are notrelated, they just have the same path - diffs and auto-merges don't makesense).*


copies: [<original_filename>, <copy_filename>]

for cloning a file. This is important for merging as well asdocumenting the author's intention.


cherrypicks: [<context>, <parent_context>] ...

for auto-merging all changes from an edge into the current state.Unlike the other examples, this potentially affects multiple files.


And 3rd party change types like:

xyzzypatches: [<filename>, <xyzzypatch-sha1>] ...

for when a file's changes have been stored in a magic patch formatthat accurately documents exactly what a user did (e.g. renamed thisvariable, added a parameter to this function). Generation of thesepatches would be done by the author's tools (e.g. a refactoring editor).It would not necessarily be possible to extract the same information onwhat was changed, how and why by generic textual comparison (e.g. diff)of the former and latter states.

Note that the order in which changes are applied is significant, and thesame change type could be used multiple times with different changetypes in between. It may be clearer (though less efficient) to definechange types in the singular and list them one by one separately.

* The "replaces" change type could equally well be represented by a"dels" of the filename, followed by an "adds" of the same filename withthe new hash. It's just an example.

   be it. the only remaining "missing" concept would be "file GUIDs",
   which I consider mostly meaningless anyways; imo if you have enough
   shared history to have a shared GUID, you probably have enough to
   work out the naming relationship by tracing through rename history.

I agree with this, though currently it's not possible to do things like"resurrect" a file in a way that allows accurately tracking of that filethrough history (though unreliable heuristics could be used). There areways to do this perfectly without file GUIDs, though (e.g. through newchange types).

 - make a clear future distinction between certs which are about
   a change (context certs) and certs which are about a particular
   tree state (manifest certs). this difference is evident for example
   in the difference between approval (context) and testresults
   (manifest), but it's not really as clear at the moment.

I'm still not convinced that there's a need for manifest certs... Ithink even testresults certs should apply to a context. Branch certs canonly sensibly apply to a context, not a manifest; different branches canbe completely different projects; completely different projects can havecompletely different procedures for determining testresults. Of course,this example isn't very clever (it's unlikely that you'd get the samemanifest in different projects), but there are several other reasons Idon't think it makes sense to apply any certs to a manifest.

   - I'd have an excuse to unpack and index the fields which I know
     the substructure of (author, date, ancestor, etc.) which would
     speed and simplify a lot of local operations.

This information could equally well be extracted from certs forindexing, right?

 - take no more space. all these items are generated each time we do
   a commit already, but as *separate* certs. the certs aren't free:
   generally there are about 300 extra bytes of crytographic data
   along for the ride on each one. that makes a commit cost about
   1500 bytes in crypto; this data object would probably weigh no more
   than that, possibly even less.

I don't remember whether I brought this up before, but I think thathaving a way to bundle certs together is quite important. These "certbundles" would contain several properties, and be timestamped and signedas a whole. There are a number of reasons I'd like this, the leastimportant of which being that it would reduce the signature overhead.

 - there would be a certain distinction between "core" and "auxiliary"
   metadata: the stuff mentionned in the context will have a seeming
   primacy over additional, 3rd party certs hung on the side. the
   experience so far seems to suggest that nobody ever sticks 3rd party
   author, date, or rename certs on a manifest anyways, so I'm not sure
   how much would be lost there.

I think an awful lot would be lost in flexibility and simplicity. I canthink of a whole lot of custom certs I'd like to add myself at committime. I'd certainly mourn the loss of a consistent approach to metadata.


Jerome

(Graydon: Sorry about the bad quoting in my last email, I was a littleoverexcited)

[Prev in Thread]

Current Thread

[Next in Thread]

[Monotone-devel] long RFC: "contexts", graydon hoare, 2004/05/25
- Re: [Monotone-devel] long RFC: "contexts", Jon Bright, 2004/05/26
  - Re: [Monotone-devel] long RFC: "contexts", graydon hoare, 2004/05/26
- Re: [Monotone-devel] long RFC: "contexts", Christof Petig, 2004/05/27
  - Re: [Monotone-devel] long RFC: "contexts", Jon Bright, 2004/05/27
    - Re: [Monotone-devel] long RFC: "contexts", Jerome Fisher, 2004/05/28
    - Re: [Monotone-devel] long RFC: "contexts", Jon Bright, 2004/05/28
- [Monotone-devel] Re: long RFC: "contexts", Jerome Fisher <=
- Re: [Monotone-devel] long RFC: "contexts", Jon Bright, 2004/05/28
  - Re: [Monotone-devel] long RFC: "contexts", graydon hoare, 2004/05/28

Prev by Date: Re: [Monotone-devel] long RFC: "contexts"
Next by Date: [Monotone-devel] beginner questions
Previous by thread: Re: [Monotone-devel] long RFC: "contexts"
Next by thread: Re: [Monotone-devel] long RFC: "contexts"
Index(es):
- Date
- Thread