gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Storage efficiency of revlibs


From: Ludovic Courtès
Subject: Re: [Gnu-arch-users] Storage efficiency of revlibs
Date: Wed, 07 Dec 2005 14:05:51 +0100
User-agent: Gnus/5.110004 (No Gnus v0.4) Emacs/21.4 (gnu/linux)

Mikhael Goikhman <address@hidden> writes:

> One of my partially populated revlibs shows ratio of 2.6 (342Mb / 130Mb).
> This ratio highly depends on many factors. Still, it is very low,
> comparable with or lower than gzip.
>
> I.e. we both may just cacherev every revision to get the same disk usage
> without needing to keep so many inodes for old revisions. And if you only
> make one cacherev per 25 revisions, you get 25 times less of disk usage.
>
> In reality, my source-only trees (containing normal {arch}, no pristines)
> produce tarball that is 10% of the "du -s tree". So for my projects, the
> tarball ratio is 10, and the revlib ratio is 2.5. Then the "cacherev
> every 50 revisions" solution is 200 times more compact than revlib.

You're comparing revlibs with tar+gzip, not gzip alone.  This is very
important since zlib operates on 64 KB of input, which is on average as
large as 8 files[0].

Guile's source, for instance, contains 1028 files.  Obviously, for each
revision, only a couple of files are touched.  In my revlib, with
approximately 40 revisions (coming from two different branches), I get a
compression ratio of 8 for Guile itself.  OTOH, an entire tarball of a
revision of Guile amounts for 2584 KB.  For 40 revisions, that's 100 MB.
My revlib for as many revisions uses 91 MB.

Conclusion:

  On projects with a small fraction of modified files per revision, the
  revlib technique yields a (slightly) better compression ratio than
  tar+gz of each revlib.

On a project with a certain number of files, most of which remain
identical across revisions, revlib can achieve compression not
achievable otherwise: it can compress /across/ revisions.  However, it
is true that files themselves, in revlibs, remain uncompressed, which
may yield compression ratio not quite as good as expected.

The most efficient solution would consist in augmenting the revlib
technique by gzipping each file individually.

BTW, as Stefan noted, comparing cachedrevs and revlibs would only make
sense if cachedrevs could be used as transparently as revlibs.

Thanks,
Ludovic.

[0] http://www.cs.ucsc.edu/~elm/Papers/msst98ltref.pdf




reply via email to

[Prev in Thread] Current Thread [Next in Thread]