gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Storage efficiency of revlibs


From: Ludovic Courtès
Subject: Re: [Gnu-arch-users] Storage efficiency of revlibs
Date: Mon, 12 Dec 2005 09:11:12 +0100
User-agent: Gnus/5.110004 (No Gnus v0.4) Emacs/21.4 (gnu/linux)

Hi,

Mikhael Goikhman <address@hidden> writes:

> It makes perfect sence for me. Only if you show that this ratio is lower
> than the revlib compression ratio (du -s against du -sl), then you may
> come to your previously stated conclusion. This math is more correct,
> because it accounts for extra files stored in revlib only. So, what are
> the results of these commands for your project?

For a single revision, tar+gz will obviously always be better than no
compression at all.  Is this what you mean?

> The actual implementation of hardlinks is filesystem dependent, there is
> usually an entry in the directory listing for each hardlink (just like
> for any filename) with a pointer to inode. But without missuring, you
> can't tell whether it is several bytes or several kilobytes per hardlink.

Right, but without knowing the actual implementation, it seems quite
relevant to consider that the size of meta-data is much lower than the
size of data in general (assuming an average file size of 8 KiB).

> Subdirectories are not hardlink-able, they occupy at least 1 inode each,
> but often many inodes, since revlib has large and ever growing subdirs.

Right.  But again, it doesn't seem too stupid to consider this overhead
negligible compared to the gain.

> It may sound correct theoretically, but I would not be surprised if even
> this is not always true. Remember, revlib includes ever-growing ,,index*
> files that may easily become 200Kb per revision. It includes changeset
> too, and both indexes and the changeset diffs are not sharable at all.
> So any theory is just words without actual verification on real projects.
>
> And again, individually gzipped files although may reduce the disk usage,
> produce new problems (busy CPU) and do not solve the file count problem.

Well, there's no such thing as a "one size fits all" solution.  You have
to make tradeoffs.  Sometimes, you want to favor storage efficiency over
CPU consumption, sometimes the opposite.  Furthermore, which compression
technique works best is highly dependent on the project you're working
on.

For instance, as I showed in an earlier post, revlib hard linking does
produce good results for Guile.  Conversely, it doesn't seem to work
well for the projects you're working on.

Thanks,
Ludovic.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]