gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] Re: [PATCH] arch speedups on big trees


From: Miles Bader
Subject: [Gnu-arch-users] Re: [PATCH] arch speedups on big trees
Date: Fri, 9 Jan 2004 16:44:03 -0500
User-agent: Mutt/1.3.28i

On Fri, Jan 09, 2004 at 09:03:06AM -0500, Chris Mason wrote:
> > Huh?  Inode caches help inventories, but they help changeset-creation even
> > more.  On source trees I use, changeset-creation _with_ inode caches is
> > pretty reasonable, but _without_ them, it's a disk-killing operation even
> > with kernel caching of file contents (because two copies of the tree's
> > data is [orig+modified] tree is bigger than the RAM available for caching).
> 
> inode sigs do make tree inventory reads faster, but at the cost of
> making tree changes much slower.

The problem is that for large trees, changeset-creation is a _killer_ without
something like inode-sigs.  On a local disk at least, even doing a full a
inventory is _extremely_ cheap compared to that (you could probably do
hundreds of them and still be cheaper than the diffs required by
changeset-creation without inode-sigs).

> And the at the end of the commit arch walks the whole tree to redo the
> inode sig. 

If there are already inode-sigs, there's no reason why it can't be done
incrementally -- that is, it can produce a new inode-sig that's at least
as good as the inode-sig available prior to the commit.

> Fixing inode sigs isn't impossible,  when you build the revision, send
> the inventory already taken during build to the inode sig creation
> funcs.

Sure, Tom's said (many times) that the current inode-sigs implementation is
um, non-optimal in terms of disk transactions.

> While applying a changeset, change the sigs using an inode sig format
> that allows for partial updates.  If Tom really doesn't want an indexed
> database file, the sig checksums and other important metadata could go
> into the files in ++id-mapping in my current patch.

Please, not the individual file-per-file implementation!

> > Is a reverse-mapping even necessary?
> 
> The early versions of my code didn't have the reverse mapping, and after
> Tom clued me into some of the issues involved, I took an approach very
> similar to what you describe below.  The problem is that for changesets
> that add files, a full inventory would be required to makes sure that id
> doesn't already exist in the tree.  
> 
> The patches that I need to apply frequently add files, so this method
> was too slow for my usage.

So all you really need is a big bag of existing ids you can check against for
conflicts?

You can't rely on your DB to catch any conflicts if you're running at a point
where the user could have changed the tree, so it seems that you've _got_ to
at least do a single full inventory at the start of a given user-level tla
command.

Given that, keeping the DB on disk seems pointless.

Why not just be more careful to maintain a big-bag-of-ids internally to arch,
based on the first inventory that gets done in a given user command (since
it's an arch-updated data-structure, this should be no less accurate than
maintaining an on-disk version, and since it's based on an initial inventory
instead of a possibly-out-of-date disk DB, it should be more accurate).

Then use the `changeset based' method I described earlier for everything
else; if you've already implemented a version of this, it shouldn't be too
hard...

-Miles
-- 
A zen-buddhist walked into a pizza shop and
said, "Make me one with everything."




reply via email to

[Prev in Thread] Current Thread [Next in Thread]