gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] Re: [PATCH] arch speedups on big trees


From: Miles Bader
Subject: [Gnu-arch-users] Re: [PATCH] arch speedups on big trees
Date: 09 Jan 2004 10:38:38 +0900

Chris Mason <address@hidden> writes:
> 1) Do inode signatures actually help performance in the current form?  I
> think they make most uses slower (except revision libraries).  In order
> for the sig to help performance, it needs to be used as a cache more
> frequently than it gets updated.

When an inode cache can be used, it _hugely_ helps performance on a
local disk (less for network filesystems, as stats are more expensive
there, but still cheaper than reads), not least by _keeping the contents
of the files out of the kernel's cache_ -- which is very important on
source trees that are as big or bigger than your RAM size.

Keeping the inode cache up-to-date is basically free, because it uses
the results of operations you have to do anyway (computing a changeset).

The big problem with inode caches now is that they're not kept
up-to-date after changeset _application_, though I think that should be
extremely cheap too.  As far as I know, the only reason it's not done is
that nobody's done it.

> This either doesn't happen, or happens because arch is doing too many
> inventories anyway.  Once you take out some of the extra inventories,
> the inode sigs make less sense.

Huh?  Inode caches help inventories, but they help changeset-creation even
more.  On source trees I use, changeset-creation _with_ inode caches is
pretty reasonable, but _without_ them, it's a disk-killing operation even
with kernel caching of file contents (because two copies of the tree's
data is [orig+modified] tree is bigger than the RAM available for caching).

> 2) Does a reverse mapping safely allow arch_apply_changeset to skip
> whole tree inventories?  I provided a sample reverse mapping
> implementation to help argue that it does.  It's fine if you don't
> like the sample implementation, I'd rather discuss the safety of the
> concept first.

Is a reverse-mapping even necessary?

The changeset already contains pathname->id-tag mappings for both the
original and modified states; so if you apply a changeset in the
`expected' sequence (to the predecessor revision of the changeset), you
don't need to do an inventory first, you could just use the paths in the
changeset.

Of course, you can't rely on that in most cases, but _usually_ the info in
the changeset will be correct.  So it seems like it would be a big win to
simply _verify_ what's in the changeset when applying it, and only do
expensive things an inventory when you detect that the changeset's info is
incorrect (and even then, in many, many, cases, you could probably do only
a partial inventory before finding the correct data).

By verification, I mean, for each pathname->id-tag mapping in the
changeset, look at the file described by the pathname, and see: does it
have the right id-tag?  Since you only have to do this per-modified-file,
it's very cheap compared to a whole tree inventory (if you can use an
inode signature cache, of course it's even _cheaper_).

If for some file in the changeset, you find that the pathname->id-tag info
is wrong (e.g., the user renamed a file that the changeset modifies), then
you could do a whole tree inventory [or even do a sub-tree inventory from
the problematic pathname's parent, progressing towards the tree-root until
you find what you're looking for].

I think for about 99% of changesets, this would extremely efficient, and a
lot less trouble (and wasted space) than keeping reverse-mappings around.

Am I missing something?

> How big is big?  In order for reading the file twice to hurt, it would
> have to be a considerable percentage of the system ram.

The real problem is that I don't know, and I suspect the actual details
vary per filesystem.  I think with such filesystems as NFS, there are
limits on file caching much lower than theoretical ones such as RAM size.

-miles
-- 
I'd rather be consing.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]