[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: big repositories inconveniences (partial pull?)

From: Nathaniel Smith
Subject: Re: [Monotone-devel] Re: big repositories inconveniences (partial pull?)
Date: Sun, 27 Aug 2006 03:10:45 -0700
User-agent: Mutt/1.5.12-2006-07-14

On Fri, Aug 25, 2006 at 06:29:53PM +0200, Christof Petig wrote:
> Nathaniel Smith schrieb:
> > On Fri, Aug 25, 2006 at 09:18:01AM +0200, Christof Petig wrote:
> >> cvssync2 adds a file (called .mtn-sync-cvs which contains (more than) a
> >> table of ( revision[/keyword_substitution] path file_id ) per revision
> >> to store that information (easily delta encoded and distributed this 
> >> way)).*
> > 
> > Would it make more sense to store these as attrs on each file,
> > instead of in file data?  cvs:revision or something?  Save on the
> > number of file formats that need inventing?
> e.g.
> file_id 03cfd743661f07975fa2f1220c5194cbaff48451
> is ::ext::address@hidden:/usr/local/cvsroot's  module/subdir/A
>  and .../A 1.18
>  and .../B 1.2

As Lapo pointed out, I think you might have misunderstood... I didn't
mean to attach this information to file_id's.  I meant to attach it
to files in the tree, using the attr mechanism:
Such attrs are stored and transferred reasonably efficiently,

Note, btw, that cvs_import cannot give a network address where the
repo can be accessed -- if we want cvs_import and cvssync to write
sync information on compatible ways, this may be a concern.

> > Is using file_id information to detect staleness of this information
> > dangerous?  I'm wondering what happens if someone does an edit,
> > commits it, then reverts the edit again and commits _that_, so the
> > file goes A -> B -> A ... it seems like an easy place to get confused,
> > and there are _already_ plenty sufficient places to get confused when
> > dealing with the mess that is cvs.
> The file_id in the table was recently added by me to cover:
> - a command modifies the sync information but the files are not in sync
> with cvs

I guess the cert mechanism I suggested would avoid this issue.  I'm
not sure that recommends it a whole lot, though; if people want to
corrupt the data, they will find a way :-).  I guess maybe it is a
little fragile to assign special meaning to any modifications of a
particular file, though; it's overloading an existing operation,
rather than creating a new one.

> - a user imports a changed cvs working directory into monotone and works
> from that. I need a mechanism to remember which files were already
> changed at that point (by inserting an invalid (or missing?) file_id)

Hmm, wouldn't the right way to do this be to import the base tree that
they're working from, commit that, and then set up their workspace as
a modified checkout of this pristine import?

> > Another approach to handling staleness might be to have a cert that
> > says "the revision I am attached to has non-stale rcs version number
> > data in it".  You just put this on trees that have been synched into
> > cvs, and it's pretty easy to find the latest revision with such a
> > cert.  Then you don't really have to have the whole machinery for
> > figuring out which bits of data are stale, etc.?
> We talk about different concepts of staleness:
> - whether a file was edited in comparison to the recorded cvs revision
> - whether a revision was synchronized with the server (whether the
> synchronization information is up to date for this revision).
> And I definitely like the concept I designed (the second time):
>  - if the file .mtn-sync-cvs was changed (and this is _not_ a merge
> node) consider that information as valid.

I guess the "_not_ a merge node" caveat is an example of what I meant
above, about overloading making things more complicated :-).

>  - if the revision has a mtn-sync-cvs certificate consider that
> information (delta encoded) as valid
> otherwise consider this revision as unknown to the cvs server
>  - if a file matches the recorded sha1 id it is unchanged.
> Any pull command will move the information back to the .mtn-sync-cvs
> file (which is the most efficient and robust storage method), any push
> command has to fall back to certificates (which are less visible to the
> user) and I plan a command to synchronze that information into the file
> (and refresh $Id$, $Revision$ and the like) by creating an additional
> new monotone revision.

If you're planning to move things back and forth like that already,
have you considered just always creating that additional new monotone
revision?  Then you could just skip the delta-chained cert stuff.  I
can see the appeal of not creating new revisions when pushing, but it
seems like the simplification might be worth it, I dunno...

-- Nathaniel

Details are all that matters; God dwells there, and you never get to
see Him if you don't struggle to get them right. -- Stephen Jay Gould

reply via email to

[Prev in Thread] Current Thread [Next in Thread]