[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: cvs2arch (was Re: [Gnu-arch-users] an hack.. one night long)
From: |
Tom Lord |
Subject: |
Re: cvs2arch (was Re: [Gnu-arch-users] an hack.. one night long) |
Date: |
Sat, 23 Aug 2003 13:24:48 -0700 (PDT) |
> From: wave++ <address@hidden>
> > Very interesting how the tla commit time closely tracks the additional
> > size of the {arch} directory tree. I guess the only change there should
> > be the addition of one logfile, which should be very quick for diff.
> In fact, tla should only diff the pristine tree against the local
> sources and produce a patch. (Am I missing something?). This should be
> almost constant over time considering that the size of the sources isn't
> changing much.
The number of disk blocks occupied by the trees increases at almost
exactly the same rate as `commit' time. The cumulative size of the
source files increases at a smaller rate.
The near perfect correlation of disk-space-for-one-tree with revision
number is a peculiarity of your particular project (though many
projects seem to exhibit a similar correlation for a bounded period of
their history).
So: you'll be wanting the inode signature optimization -- to spare tla
from reading so many files. As I mentioned, there's even a first-cut
at it in the patch queue.
You _might_ be wanting a better file system -- to impose less overhead
not on disk space, but on the amount of I/O needed to read the tree.
Such filesystems have been around for a long time in the BSD world.
I'm not sure why they haven't been a higher priority in the Linux
world.
> >> arch @ patch 280 is roughly 8 times slower than arch @ patch 5 in this
> >> case.
> > Notice how it closely tracks the size of the complete archive, while the
> > size of the actual source tree isn't changing much. Either you do not
> > have a pristine tree and tla has to check out the last head revision on
> > every commit, or something fishy is going on in the code that tries to
> > diff the logs for the working directory with those of the pristine tree.
> > Only one file should be added and that can not be all that expensive for
> > diff although I guess that tla is reading all the data in both trees
> > twice, once by tla because of a bad call to safe_stat in libarch/diffs.c
> > and the second time by diff because timestamps are not trusted.
> Seems like it tries to do something with old patches (like regenerating
> the head every time).
Why do you believe that?
> I'll now try to profile it and see what happens. I already converted
> almost all my cvs repositories without problems.
> Meanwhile, anyone aware about a recent "tla tag" bug/issue?
Is it in the bug database?
-t