[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Re: [arch-users] advanced usage advice: the prism t

From: Andrea Arcangeli
Subject: Re: [Gnu-arch-users] Re: [arch-users] advanced usage advice: the prism technique (fwd)
Date: Sat, 27 Sep 2003 12:57:04 +0200
User-agent: Mutt/1.4.1i

On Sat, Sep 27, 2003 at 12:14:35PM +0200, Florian Weimer wrote:
> On Fri, Sep 26, 2003 at 11:47:05PM +0200, Andrea Arcangeli wrote:
> > So now the only thing I care for efficient operation is that I can cache
> > the unpacked tree in unpacked form (and I believe it's already the
> > case), and that every new tree I checkout with 'get' gets forked sharing
> > the inode with an _hardlink_.
> I wouldn't want to share the reference tree (be it a pristine tree or in
> a revision library) with my working copy.  I'd view this as just too
> dangerous.  Arch could generate wrong patches or change files way back
> into the history of the project.

that's a very interesting point. Ideally we would need a copy-on-write
hardlink functionality from the kernel. That should be technically
doable and it would make tons of other things more reliable too, but it
sure involves changes to the directory structure too in the fs, so it'd
be sure nothing we can depend on in the short-mid term.

However note that to me it wouldn't matter much, since I already take
this risk all the time. My 2.4 and 2.5 trees could share hardlinks too,
they evolve over time and they always shares the old hardlinks. It's up
to me not to edit any tree with vi ;), like it's up to me not to run `rm
-r /`. Only once in a few years I found a corrupted file because of an
erratic non-copy-on-write edit.

So from my point of view, even without any copy-on-write mode,
optionally being able to full hardlink everything, sounds optimal. I
understand this should be provided with a very fat warning to avoid
people to screwup the checkout procedure and generate wrong patchsets
etc... You must know what you're doing when you use that mode, but then
arch would have a chance to be nearly as fast as my actual scripts,
skipping the first cp -a or tar xzf enterely, and that sure would make
an huge difference in terms of usability if you've 300 pure branches.

> On the other hand, if you want to share project trees, which are
> essentially unrelated in terms of arch, it's probably better to use an
> external tool to hard-link files with identical content.  I have some

this tool already exists and it's attached (it was in a very very old
package called perforate that I still use sometime in source form)
careful, it destroys all the permissions so don't run it on /dev ;)

        finddup | nodup

they're small enough that I attached it.

Yes, that would save the duplication of space, but it would be slow (it
sure can be improved in performance though) and it would still not
address the speed of a 'get' of a pure tree.

Besides the space argument, one of the main properties of cp -la is that
it takes nothing, unlike a cp -a or a tar xzf of a 500m payload similar
to the 2.6 kernel.

> ideas for speeding up such a tool by caching (in fact I've recently
> written something quite similar for my maildir folders).  This would
> still result in full-tree copies if you use certain merging methods
> (probably just "update"), but otherwise, it should come pretty close to
> what you want.
> Or am I missing something?

I don't think you're missing anything.

Andrea - If you prefer relying on open source software, check these links:

Attachment: finddup
Description: Text document

Attachment: nodup
Description: Text document

reply via email to

[Prev in Thread] Current Thread [Next in Thread]