Re: [Gnu-arch-users] [PATCH] arch speedups on big trees

gnu-arch-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] [PATCH] arch speedups on big trees

From:	Tom Lord
Subject:	Re: [Gnu-arch-users] [PATCH] arch speedups on big trees
Date:	Wed, 28 Jan 2004 14:09:28 -0800 (PST)



    > From: Chris Mason <address@hidden>

    > I think tagline have a few issues, please correct me if I've got some
    > facts wrong here.  

    > They don't work when you don't control the entire source.  So
    > for mirroring an existing project that isn't using arch, they
    > are not very practical.

That's right.  That's why I suggested that it would be productive for
you to extend one of the optimizations that currently applies only to
tagline ids to also work for explicit ids.

The optimization currently in place is this: in a tagline or implicit
tree, for a file with no corresponding ".id" file, if the inode
signature has not changed since the inode-cache noted the file's id
tag, then assume the id tag has not changed.  In other words, skip
reading the source file to look for an arch-tag: line.

The generalization is: in an explicit, tagline, or implicit tree, for
a file _with_ a corresponding ".id" file, so long as the ".id" file
has not changed and is in the right place, assume the id has not
changed.  In other words, skip reading the .id file.


    > taglines don't reduce the need for inventories.  

Correct.  But....

Arch certainly has to stat all of the files in a project tree that
it's inventorying.   It has to read all of the directories.   This is
semantically mandatory.

But that should be pretty economical.  Even 15,000 stats, if the
inodes are in the cache, and a few K hundred directory reads, if those
are in the cache, should be pretty damn cheap.  Even cold, it's
decreasingly horrible over economic time.  There are a class of user
who want to work on trees of that size on what is, by today's
standards, pretty dinky hardware.  But that class as a % of users is
going to do nothing but shrink rapidly over time.  Meanwhile, most
people doing serious and sustained work on trees of that size, should
be able to afford traversing their project trees even today.

Arch _somewhat_ needs to traverse revision library trees and
pristines.  It depends on how much you trust those trees.  Traversing
them allows the inode signature to be validated.  It'd be a little
weird -- but I wouldn't object to an option that told tla "just trust
the revision library (at your own, slight, risk)".  As an option,
people could at least use that in a pretty reliable way.

The cost of an inventory, though, _can_ be much more than just a
stat'ing traversal.   If the inode cache is stale, an implicit tree
will have to examine lots of source files (their contents, not their
inodes).   _Currently_, and this is what I'm suggesting you fix,
explicit ".id" files must be read for every (tag reading) inventory.

Now if you have, say, 15,000 files, mostly or entirely explicitly
tagged, that means that a tagful inventory is going to read 15,000
small files.  And, actually, the way it works, it's going to read
those .id files twice each.  That's a pretty big jump from just a
traversal in system call count, kernel cache pressure, and i/o
bandwidth pressure when the cache lets you down.   And it's that
difference -- the difference caused by having to read those .id files
-- that I'm suggesting you get rid of.

Beyond that suggested speedup, there are probably some other small
ways to cut down the syscall count without going all the way to the
(bogus) proposal of not traversing trees.

    > One of the things that Miles and others on the list convinced me of was
    > that my reverse mapping was really just an extension of the inode sigs. 
    > Combined with the partial inventories, a well indexed inode sig would
    > work.

I really don't understand what more you think is needed.

Inode signature files are pretty small.   It doesn't take much to read
them.  You can reverse them and index them 7 different ways from
tuesday in-core at low expense.   If you're convinced that inode sigs
are all you need -- well, that's already done.

Earlier you seemed to want something more.   You wanted to have an
inode sig file that didn't map to a specific revision -- but did
reliably map ids to inodes no matter what the user does to the tree.
That, of course, can't be done.   Well -- actually it can be done --
but you'd need a "smart filesystem" that updated the index in response
to system calls rather than tla operations.

That possibility -- the "smart file system" -- sure sounds exotic and
unlikely but I _can_ see it being worthwhile.   If you wanted to make
a distro that was _really_ _really_ tuned for developers .....

-t

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Gnu-arch-users] Re: [PATCH] arch speedups on big trees, (continued)

Prev by Date: [Gnu-arch-users] Re: PATCH: tla on cygwin (works!)
Next by Date: [Gnu-arch-users] [PATCH] ftp port specification in ftp archives url
Previous by thread: Re: [Gnu-arch-users] [PATCH] arch speedups on big trees
Next by thread: Re: [Gnu-arch-users] [PATCH] arch speedups on big trees
Index(es):
- Date
- Thread