[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] Re: [PATCH] arch speedups on big trees

From: Chris Mason
Subject: [Gnu-arch-users] Re: [PATCH] arch speedups on big trees
Date: Wed, 07 Jan 2004 00:07:08 -0500

On Tue, 2004-01-06 at 22:41, Miles Bader wrote:
> Chris Mason <address@hidden> writes:
> > > One file per id?!  That seems insane for large trees (the only case where
> > > such optimizations are interesting anyway)...
> > 
> > There are a number of indexed filesystems to choose from that will
> > handle one file per id nicely.  There are no operations that will read
> > all the files in ++id-mapping, unless you make a changeset that
> > changes every file in the repository, or you are linking the rep.
> That doesn't help with the space issue.  For instance, on a UFS
> filesystem, the fragment size is typically 512 bytes, so a linux source
> tree would have about 8MB of index files; on an ext2 filesystem, it
> would be 64MB!
> [`Use a different filesystem' is not a good answer -- I use reiserfs
> right now, but using techniques that perform poorly on ext2 is not a
> good way to become popular.]

Yes, there's a space wasting issue, but arch has that in general with id
files already.  One file per id in the reverse mapping is somewhat
consistent with how arch already does things.  Yeah, this isn't a real
answer, but there's a little more below ;-)

> > The reverse mapping improves the speed at which changesets are applied,
> > and this happens without any special work from the user.  applying
> > changesets is a pretty fundamental operation, so making that faster
> > improves things across most of arch.
> Wait a minute, so you've _entirely disabled_ inode-sigs?  That seems
> like a fairly significant lose...

>From a performance point of view, inode sigs seem intended to make it
faster to inventory the whole tree.  On the surface, this is a great
idea, except that updating the inode sig after each commit requires a
fresh inventory of the whole tree.

As the size of the tree grows, there's just no way to make the inventory
painless.  My patches try instead to limit the number of times a whole
tree inventory is done.

> I'm a bit confused -- if you're going to depend on keeping the
> reverse-mapping up-to-date, why is that any more reliable than keeping
> inode-sigs up-to-date?  Why not just have one big `signature database'
> (preferably not `big' in reality of course :-) that includes both inode
> and pathname information, and make sure it's always kept as up to date
> as possible by all operations?
I guess I went into this assuming there was a reason arch didn't already
use an embedded database for ids (Tom's personal taste?).  The reverse
mapping is really just a database style index into the source tree. 
There's not much semantic difference between having id files and all
relevant info stored in a database.

> It just doesn't seem like these things should be at odds with one
> another.
> > The inode sig could be an indexed file for partial updates.
> [Is it really even necessary? -- even reading/writing a big sequential
> file probably pales compared to the overhead of doing tons of stats.]
It depends on the size of the archive vs the number of files usually
changed by a given changeset.  For large trees, if the average changeset
only touches a few files, it's going to be significantly faster to stat
those few files than write a big sequential file.

For small source trees, you probably (hopefully ;) won't be able to tell
the difference.

> > Changes to make pristine trees hard linkable and replaceable do require
> > user work (you have to specify a hook), but that is in the same style as
> > hard linkable libraries.  It's a new feature, so the defaults haven't
> > really shaken out yet.
> BTW, can you expand on the reasons why you want to keep pristine trees
> -- they're generally a lot more annoying to manage than revision
> libraries, so it'd be nice if they were as unnecessary as possible.  Is
> it just locking issues (you can assume you've got sole access to
> {arch})?

It's mostly that I like the idea of the pristine tree walking forward in
revision level as you do commits.  That way the next commit is ready for
quick diffing, and doesn't require a full inventory.  I'm not married to
using pristine trees for this, but they seem to be working nicely.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]