[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Ideas and questions.

From: Nathaniel Smith
Subject: Re: [Monotone-devel] Ideas and questions.
Date: Sun, 13 Feb 2005 00:21:42 -0800
User-agent: Mutt/1.5.6+20040907i

On Sat, Feb 12, 2005 at 07:44:09PM -0500, Jeremy Fincher wrote:
> 1. "monotone" is too long a command name.  Perhaps a shorter command 
> name can be offered, say, "MT" (lowercase "mt" is taken, unfortunately, 
> but I'm willing to press shift :))

I've considered "mtn"... you can always alias it, of course.

> 2. I noticed in the manual, each user in the test project named his or 
> her database "abe.db" or "beth.db" and put it in his or her home 
> directory -- does this mean that I use one global database for all my 
> Monotone-managed projects?  If so, what is the advantage of this, 
> compared to storing a database (or a link to it) in each working 
> directory's MT/ directory?

We don't have enough experience with people working on multiple
projects using Monotone yet, to really know what's best here.  My
sense is "one db per project" will be what people generally end up
doing, but *shrug* we don't really know.

I suspect that the tutorial project has databases named like that
mostly so that it's easy to refer to them separately in the text.

You could store a separate copy of your project's database in every
working directory, but it would kind of be a waste of space!

You can't use a symlink to point to your database; sqlite will get
annoyed.  (Two instances of monotone using different names to refer to
the same database, won't be able to find each other's rollback logs.)

Fortunately, monotone working copies know how to keep track of where
your database is, so you don't really have any reason to want to make
multiple copies of it.

> 3. It says in the manual, "the cert name branch is reserved for use by 
> monotone."  Does this mean that any given revision may only belong to a 
> single branch at a time?  If so, why is that?  If not, what am I 
> missing?

Hmm, you misinterpret that text -- what it means is that the name
"branch" is privileged, it has special semantics, so we want to warn
people not to use it for any ad hoc certs they might want to create.

I just looked at it, but it's not obvious to me how to make it
clearer; any suggestion?

(Any given revision may certainly belong to more than one branch at a

> 4. This is just a pet peeve of mine, but what are the chances that 
> Monotone's source code (.cc, .hh files) can be moved into a src/ 
> subdirectory of the main distribution tarball?  As it currently stands, 
> an "ls" in my monotone-0.16 directory doesn't even fit into my 80x51 
> terminal, and that would be fixed if the .cc and .hh files were in 
> their own sequestered directory.

Eh.  Source code is for developers.  Developers have to type more if
they always have to say "src/" before they do anything.  Since
"editing source files" is by far the most important use case of the
source tree... I'm kinda inclined to keep optimizing for that.

> 5. I'm fairly convinced that Monotone is using the proper "model" for 
> version control, but there are user-interface considerations that I 
> think would aid in doing the kind of distributed development Monotone 
> is aiming for.
>   a. I think there should be an easy way for someone to publish a 
> pull-only repository via "commodity" (e.g. HTTP) protoocols.  Many
> people may be firewalled or otherwise prevented from publishing their
> repository via a monotone server process, but anyone worth his or her
> free software developing salt should have some webspace somewhere
> to which he can publish a respository.

Part of this could be solved just by getting someone to provide
community monotone hosting, as a service for developers everywhere --
any volunteers? ;-) (We don't know whether the netsync implementation
would stand up to a security audit, though it is coded carefully; the
fundamental design should be secure, there's no need to e.g. give
developers accounts on the machine, like CVS requires.)

This does assume that developers behind firewalls will still be able
to _push_ to a public netsync server, acting as clients; is that a
crazy assumption?

How to do synchronization with a dumb HTTP server:
Export your database to a bunch of nicely arranged flat files.  We'll
actually synchronize two directories of these flat files.  The
interesting part is how to make them "nicely arranged".

Let's define a "merkle directory" format.  Say we have a bunch of
files, that by some coincidence all have names that look like
"86d32c8a7766952af6bd891c2bbf5ae4cf5e3faa" or so.  Split each name
into quads to get a directory to stick that file in, so, in this case,
we get
as our filename.

Now, in each leaf directory, run 'ls' and stick the results in a file
named <that dir>/FILES.  Now, in each directory one level up from a
leaf directory, run 'sha1sum */FILES' and stick the results in a file
HASHES.  Go one directory up and run 'sha1sum */HASHES' and stick the
results in a file HASHES.  Repeat this all the way up the directory

So that's the first trick -- if a clever client is looking at a tree
with this structure, and also has a similar tree locally, it can use
the information there to retrieve all of the new files in O(log(n))
HTTP requests.

The second trick is that using flat files, we can't expect to be as
clever as netsync, so we don't even try to be that clever.  So we
don't do the complex multi-stage thing where we synchronize certs,
then use that to fetch revisions, then use that to fetch file deltas,

What we do instead is just, given the collection that we're exporting
  1) find all revisions in that collection.  For each revision id
     $rid, do
       $ monotone rdata $rid > $rid
     Arrange all the resulting files in a merkle directory called
  2) find all certs that are on those revisions.  Somehow extract each
     cert into a file containing an [rcert] packet, named after that
     cert's hash.  (We track these hashes internally, not a big deal
     to do this, though there's ui to quite get this data out right
     now.)  Arrange all the resulting files in a merkle directory
     called "certs".
  3) find all the manifests in those revisions.  Write out all root
     ones as 'monotone mdata $mid > $mid', write out all successive
     children ones as 'monotone mdelta $mid_parent $mid > $mid'.
     Arrange all the resulting files in a merkle directory called
  4) Do the same thing to files as to manifests, putting them all in a
     merkle directory called "files".
  5) Do the same thing to keys, putting them all in a merkle directory
     called "keys".
  6) (Maybe stick some metadata at the top of the tree, describing who
     exported this and when and what collection it is, just cuz that's
     handy to have around)

When we want to sync, we just use the above-mentioned quickish
algorithm to pull all the new files from each of the 5 merkle
directories.  Whenever we download a file, we append its contents to a
big log file.  When we're done, we feed the whole log file into
"monotone read".  (We could also just spawn "monotone read" on the
other end of a pipe, and feed it as we go.  The important thing is
that we have to feed everything into just one instantiation of
"monotone read", we can't start a new monotone process for each file,
because monotone knows how to sort out all the various packets into an
order that makes sense.)

Sorta baroque, but really not that complicated, and the hooks needed
to do it aren't very complex...

(Another amusing way to do it would be to have a CGI that spits out
basically the above stuff on demand; again, not too tricky.)

>   b. I think there should be an easy way for a user who publishes a 
> repository in such a manner to accept patches via another
> "commodity" protocol (e.g. SMTP) and perhaps apply them automatically
> if they're appropriately signed.  Come to think of it, is it
> easy/possible for Monotone to "export" a new revision (changeset)
> along with all its attached certificates, etc., in such a way that the
> exported file may be imported by someone else who isn't able to
> connect to a monotone server for some reason?

There's no reason we can't define some parseable format that contains:
 -- text of a revision
 -- human-readable cert data for that revision
 -- little unobjectionable blobs of base64'ed signature data for those
    certs (maybe at the very end of the file)
 -- human readable unidiffs for each file changed in the revision,
    relative to each parent of the revision
and then have a way to import these (checking of course that when the
diffs are applied to the specified file versions they produce files
with the expected sha1, etc.).

Simple Matter of Programming, basically, and it would be rather handy.
Just no-one's done it.

> Mostly what I'm concerned with here is that Monotone's goal seems to
> be to make distributed development easier, but requiring that change
> propogation occur through netsync (and thus, through channels which
> may be hard to use through various firewalls and other annoying
> devices) raises the bar significantly over, say, Darcs, which just
> allows the user to copy his repo to his webspace and publish a URL.

I'm not sure that netsync is really harder overall, given the vagaries
of dealing with HTTP servers and the mail infrastructure, but hey, if
people want it...

> 6. The manual said that branch names had to be unique globally -- does 
> this apply to all Monotone databases everywhere, or just the ones that 
> are working on my project?

All monotone databases everywhere, which is why we recommend embedding
a domain name into your branch names.

Of course, if you never sync with anyone working on a different
project, then it won't matter if your branch names conflict.  But
that's an assumption that you probably don't want to make.  It's quite
handy to be able to share a netsync server among people, or perhaps to
pull multiple projects into a single database to merge between them
(e.g., if sqlite used monotone, monotone's own source tree could merge
in a branch of it to create the sqlite/ directory, and we'd get
history sensitive tracking of upstream changes that way).

There's also how things work conceptually.  In your mental model of
monotone, it's useful not to think of databases as distinct objects.
Rather, there is One True Database In The Sky, which we modify every
time we hit "commit" or "merge" or whatever.  Any given database has
some partial knowledge of the One True Database; netsync is how
databases share what they know with other databases, so that their
knowledge increases.

If this just sounds silly to you, then you can also ignore it :-).
But it can be helpful.

> 7. One of the things I really liked about Darcs was the ability for me 
> to specify exactly which changes in which files I wanted to record as 
> part of my patch.  Oftentimes I'll have several semantically unrelated 
> changes active and unrecorded in my working directory since I follow a 
> sort of stack workflow, rather than a queue.  It was especially nice to 
> be able to continue working in that way and trust that when I got 
> around to recording the changes I'd made, I could pick and choose 
> exactly which ones to put into which patches.

Eh, umm, could do that with some UI, I guess.  Right now we let you
pick changes with file granularity.  I'm not sure how to do better
than that with a reasonable UI; I'd sorta rather not get into the
whole "designing an interactive text UI" thingie.  It shouldn't be
hard to write it as a 3rd-party kind of add-on, though.

(A good way to convince me that something I think is probably not
worth doing, is in fact worth doing, is to show me that people use it
when given the chance :-).  This is certainly something that could be
implemented with only slight kluging without any changes to monotone
itself, and if it turned out to be useful we could think about how to
remove the kluging.)

-- Nathaniel

"But suppose I am not willing to claim that.  For in fact pianos
are heavy, and very few persons can carry a piano all by themselves."

reply via email to

[Prev in Thread] Current Thread [Next in Thread]