gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Some issues


From: Colin Walters
Subject: Re: [Gnu-arch-users] Some issues
Date: Wed, 09 Jun 2004 21:29:48 -0400

Now, a more thorough reply:

On Wed, 2004-06-09 at 10:03, Florian Weimer wrote:
        
>       * The changeset format is defined relative to GNU patch and GNU
>         tar. These data formats are still somewhat in flux.

Can you back this up?  I have never heard of any problems.

>         The changeset format does not handle binaries efficiently, 

The changeset format supports them as efficiently as possible. 
Changesets are intended to be used like patches are - i.e. you can send
and retrieve them as self-contained entities.

But sure, delta-compression is something that could be done with a smart
server, as has been discussed in the past.  There's no reason this would
have to break backwards compatibility.

>       * and certain text files (e.g. XML files not created by a text
>         editor and formated for readability).

It wouldn't be hard to imagine extending the changeset format to include
a delta from a higher-level tool that knows about the file format, *in
addition* to the regular GNU diff.  That way if there is a conflict, the
user's tla could optionally call out to an external program which would
make use of this information.  Otherwise, they just get the plain diffs.

>       * In essence, an archive consists of concatenated changesets,
>         which are directly exposed in a file-based interface. This
>         makes it very complex to address issues with the changeset
>         format itself, and the archive interpretation might change
>         when new versions of patch and tar are installed. 

This is just be a reworded version of your first point, which has
already been addressed.

>       * Arch does not implement a distributed system. For example, its
>         archive replication does not transparently handle write
>         operations.

Is this just a really obtuse way of saying "cacherevs for older
revisions aren't automatically mirrored"?  That's an easy to fix bug,
and I think it already has been.

>       * There is no integrated mechanism to atomically commit related
>         changesets to two branches (even if these branches are
>         contained in the same archive).

I can't imagine a use for this at present, but you could write a little
script to do it using lock-revision, and add a --unlocked argument to
commit which makes it assume the revision has already been locked. 
Should be about 30 minutes worth of work at most.

>       * Categories, branches, and versions are not orthogonal at all
>         and add unnecessary complexity. Future features cannot
>         differentiate between them because they are used very
>         inconsistently in existing archives.

This is way too vague.  Do you have a concrete problem?

>       * The idea to automatically subject files to revision control,
>         based on regular expressions, is very hard to deal with for
>         users. While being an interesting experiment, it does not lead
>         to increased usability.

You don't have to - just use tla add, and ignore the warnings about
untagged files matching the source regexp.  Or just delete the warning
in your copy of tla.  Really, this is just a trivial UI issue.

>       * GNU arch does not support a centralized development model
>         which lacks a single, designated committer.

This has been thoroughly debunked.

>         Branch creation is not versioned. 

Is this a problem?

>       * Branches cannot be deleted. 

I don't think this should be possible.
> 
> Please note that while these issues are likely too fundamental to be
> fixed in GNU arch without breaking backwards compatibility, 

Actually *none* of the issues you have raised have been unsolvable
without breaking backwards compatibility, as discussion has shown.

> Implementation Issues

Most of these are just bugs, as you know.

>       * The access methods for remote archives are subject to a lot of
>         round trips. Therefore, archive replication using tla itself
>         is very slow.

I believe pipelining is already implemented for SFTP, someone just has
to do it for HTTP.

>       * The archive format optimizes for access to early versions, not
>         most recent ones as one would expect. (Once the archive format
>         is no longer exposed directly, this becomes an implementation
>         issue, not a design issue.)

This should be solved with the backbuilder, along with a little cron job
to cache revisions.

>       * The caches which compensate the previously mentioned issues
>         are not expired by tla. (This includes revision libraries and.
>         apparently, pristine copies stored inside a checked-out copy
>         of a revision.) 

Easily implemented via cron.

>       * Changesets are tar files. They cannot be posted easily to a
>         mailing list for approval and commit; metadata tends to get
>         lost.

Umm...you can post tar files to mailing lists.  People do it all the
time.

>       * In practice, tla requires four inodes per file in a
>         checked-out project tree: one for the file, one for the file
>         ID, and a a pristine copy of both. This gratuitous use of
>         inodes can cause problems.

What problems?

>       * A checked-out revision of a branch contains at least one inode
>         for each revisions that was ever committed in the history of
>         the branch. Long-running branches also result in huge
>         directories with lots of entries.

Nope - you can delete patch logs.

>       * The inventory code can create inconsistent results. For
>         example, explicit tagging only overrides classification based
>         on regular expression in some (but not all) parts of tla.

Just a bug, if it still exists.

>       * The inventory constructor, project tree checker, and changeset
>         creation code are not fully synchronized. For example, it is
>         possible to commit a changeset with an inconsistent inventory,
>         which is also inconsistent as a result.

Just another bug.

>       * Branch creation is very cheap (a few inodes in the archive),
>         but a long-running branch to which changes in a mainline
>         branch are periodically merged replicates all changes on
>         mainline. This means that branch maintenance costs are
>         controlled by the amount of development on the branch and the
>         development on the mainline, and branches are no longer very
>         cheap in total. (This is an implementation issue because
>         unlike other systems, merge tracking does not depend on the
>         way changesets are combined in the archive. This is actually a
>         very strong point of GNU arch.) 

I think this would be possible to solve by using something like
"interdiff" to only store the differences relative to another changeset.

> 
>       * The GNU arch developers believe that it's easy for all
>         developers participating in a project to publish a repository.

I don't know how arch could possibly make it easier.  What do you
propose instead?

>       * Genuine support for centralized development is required, but
>         GNU arch is unlikely to provide it. 

You keep repeating this.  It is completely false.

>       * The tendency to trade decreased code complexity for increased
>         running time and more disk space was fine when tla got
>         started, but today, it results in performance that does not
>         compare favorable with optimized competitors. In addition,
>         disk seek times have not improved at a significant rate, and
>         the huge amount of stat operations performed by tla will
>         remain a bottleneck even when developers move to larger
>         machines.

Sure.  The inventory code could be optimized.

>         The developers seem to underestimate the need for a robust
>         user interface with clear error messages 

A number of these are already fixed, waiting to be meregd.

>         and transaction semantics (i.e. a command either fails and
>         changes nothing, or it completes successfully).

tla should unlock a revision it locked when an error occurs, yes.

>         tla input and output formats are currently deliberately
>         incompatible with the rest of the GNU system. 

Yeah, the pika encoding seems like crack to me too.
>       * 
>         Redesign the changeset format, probably based on VCDIFF (RFC
>         3284). Unlike unified diffs (which are currently used by tla),
>         VCDIFF deltas are one-way and not reversible when just the
>         delta itself is known. 

But then you propose more crack :)

>       * (this is not so much of a problem, tla uses changesets only in
>         forward direction most of the time).
>       * 

Even if tla itself didn't use changesets backwards, users will want to. 
And with the backbuilder, tla will do it very often.
>         
>       * Provide a human-readable changeset format with complete
>         metadata. This format is intended for exchange of patches over
>         mailing lists and should include unified diffs.

As has been discussed a lot in the past, it would be nice.

>       * Do not expose the archive format, but use a changeset server
>         which implements access control (and pipelining, to cut down
>         effects of network latency).

I don't think a server should be required, but it would be nice to have
as an option.

>       * Project trees should not abuse the file system as a database.
>         If a database is required, use a real one (such as BDB or
>         SQLite), or CSV files containing multiple records, but not one
>         file per record.

I think this would be nice to have too.

>       * Use a file cache (with LRU logic) instead of revision
>         libraries.

Why would you want that?  Most of the time you're going to be comparing
complete revisions.  I suppose it might be useful for a file-oriented
web-based arch browser though.  Certainly a file cache doesn't replace
revision libraries.

Attachment: signature.asc
Description: This is a digitally signed message part


reply via email to

[Prev in Thread] Current Thread [Next in Thread]