gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] [PATCH] arch speedups on big trees


From: Tom Lord
Subject: Re: [Gnu-arch-users] [PATCH] arch speedups on big trees
Date: Fri, 19 Dec 2003 11:08:45 -0800 (PST)


    > From: Chris Mason <address@hidden>

    > I've been playing around with a few ideas to improve arch performance on
    > large source trees, mostly in the area of applying changesets, and
    > creating changesets.  I've got a sample archive here with 100 changesets
    > on top of the linux 2.6 kernel, and vanilla arch takes  a number of
    > minutes to apply them all (15-30 seconds per changeset via tla replay)

    > This is primarily because arch is doing an inventory of the source tree
    > before each changeset, my patch changes things to inventory only the
    > files touched by the changeset instead.  It sends a table of the
    > candidate files to the inventory funcs, and this brings the time to
    > replay my 100 changesets to ~4 seconds.

Holy crap!  Really?

    > This is lightly tested (make test and a few others), 

This is the kind of change that needs to be made very carefully.

A quick scan of the technique you used suggests to me that it is
certainly not correct.  In particular, it will not work properly for
_merges_ even though it is either right or close-to-right for _exact_
patching (such as when building a revision).  Note that `replay'
(normally) counts as a merge command -- only when it is invoked from
`update' would we believe a priori that it is doing exact patching.

So there's four things here:

1) needs more testing

2) minimally, the optimization needs to be only sometimes used -- only
   when we know that a changeset is being applied exactly rather than
   as part of a merge

3) maximally, _perhaps_ its worth trying to think about how to
   generalize the hack to handle inexact patching.   I'm not so 
   sure it is though -- you can just use `update' rather than `replay' 
   and meanwhile, even without generalization, the hack can speed up
   (dramatically, apparently) `get' and other cases of building a 
   revision from changesets

4) since this appears to be a huge win, performance-wise, it might 
   be interesting to take a slightly different approach that would
   be harder to code up but would get inexact patching right:

   Instead of trying to infer "what files to inventory" from the 
   contents of the changeset, an alternative is to do a full inventory
   once, for the first changeset, but then keep track of what parts
   of the filesystem are being changed along the way.

   In other words, you could achieve much the same effect by caching
   directory reads and stat calls -- and accurately invalidating 
   cache entries as things change.   apply_changeset would still be
   doing what it thinks is a full inventory: but that full inventory
   could often hit up the cache rather than making system calls.

   The caution is that past experience with arch has shown that it's
   hard to maintain such a cache accurately, and accuracy is critical.
   It would make _some_ sense to (mostly) implement it deep in the
   heart of VU, as a descriptor-handler layer -- but then you also
   have to watch for interactions with, for example, a fork/exec of
   patch.

-t





reply via email to

[Prev in Thread] Current Thread [Next in Thread]