gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'


From: John A Meinel
Subject: [Gnu-arch-users] Re: [GNU-arch-dev] [ANNOUNCEMENT] /Arch/ embraces `git'
Date: Wed, 20 Apr 2005 08:26:06 -0500
User-agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)

Tom Lord wrote:

`git', by Linus Torvalds, contains some very good ideas and some
very entertaining source code -- recommended reading for hackers.

/GNU Arch/ will adopt `git':

From the /Arch/ perspective: `git' technology will form the
basis of a new archive/revlib/cache format and the basis
of new network transports.

From the `git' perspective, /Arch/ will replace the lame "directory
cache" component of `git' with a proper revision control system.

In my view, the core ideas in `git' are quite profound and deserve
an impeccable implementation.   This is practical because those ideas
are also pretty simple.

I started here:

  http://www.seyza.com/=clients/linus/tree/index.html

and for those interested in `git'-theory, a good place to start is

  http://www.seyza.com/=clients/linus/tree/src/liblob/index.html


Overall, though, I glad to see arch being able to go for compatibility.
It would be nice to see lots of different front-ends all be able to play
with the same archives and working trees.

But I have a question about blobs. They are stored compressed, and the sha 
checksum is for the *compressed* form. I understand this is probably for 
performance reasons. I'm concerned, though, that compression routines may not 
be 100% deterministic across all platforms. Certainly just changing the 
compression level will change the compressed output. It also doesn't let you 
experiment with different compression routines,etc. Also, I might have exactly 
the same file as you, but if I compress at 8, and you compress at 6, then their 
blob address would be different, and we wouldn't claim similarity. (Also, is 
compression deterministic across different endian machines?)

It might be that as long as you provide exactly the same inputs
compression is deterministic across all platforms (at least for a
specific compressor, like gzip).

Having the handle fixed at 160 bits also seems limiting. It ties the
entire archive format into exactly one hash.

I suppose as long as there is a version marker to allow new blob db
versions, and the specific compression routine parameters are well
defined. I just want to make sure that is done up front.

Also, this doesn't seem to work really well as a revlib format, it
probably makes a great archive format, but revlibs need to know the
contents so they can diff against eachother.

(Linus is not literally a "client" of mine.  That's just the directory
where this goes.)


-t

John
=:->

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]