monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] CVS sync works (for me)


From: Christof Petig
Subject: Re: [Monotone-devel] CVS sync works (for me)
Date: Mon, 21 Feb 2005 15:08:53 +0100
User-agent: Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.7.5) Gecko/20050105 Debian/1.7.5-1

Nathaniel Smith schrieb:
On Mon, Feb 21, 2005 at 08:25:28AM +0100, Christof Petig wrote:

So  it looks like I will have to resort to diff encoding.

please ignore my stats, on second reading i copied and pasted the
monotone.db stats, here are the stats for _my_ project:

file_deltas|19196|7450081
files|2401|21156771
manifest_deltas|3604|1526596
manifests|8|159796
merkle_nodes|0|0
private_keys|1|920
public_keys|1|283
revision_ancestry|3612|288800
revision_certs|18060|76092176
revisions|3612|2707209

revision_certs|author|3612|1189416
revision_certs|branch|3612|1175528
revision_certs|changelog|3612|1245468
revision_certs|cvs-revisions|3612|71351208
revision_certs|date|3612|1130556

Would it be possible to simply store less information?  A single
(path, new-id) pair is enough to uniquely identify a CVS checkin,
right, and won't trigger the incredibly nastiness of crawling all
over the place gathering certs (that hopefully we trust, but who
knows) before we can figure out what's going on?

if you look at the contents of the cert, e.g.:
cvs.midgard.berlios.de:/cvsroot/midgard/midgard
1.5 .cvsignore
1.6 .glademm-callbacks
1.2 AUTHORS
1.399 ChangeLog
1.8 Makefile.am
1.472 NEWS
1.2 NEWS_old
1.11 NOTIZEN
1.22 README
1.84 TODO
1.1.1.1 acconfig.h
1.7 autogen.sh
1.110 configure.in
1.1 docs/About.html
[...]

you see that for larger projects only a few of the files change per
revision (edge). But such a cert can easily grow to 1817 lines and 31.5k
bytes. If you multiply that with the amount of edges (3300 for this
project) you get about 90MB! So I decided to store only the changing
files like:

cvs.midgard.berlios.de:/cvsroot/midgard/midgard   (repository)
+ebf337072571135affe49b5da42b7342ddba0852         (last revision)
- dir/deleted
1.5 dir/changed
1.1 dir/added

This should get the size down to a reasonable amount and is readable
enough to be able to verify by sight. [That's actually the reason I
refrained from reverse diffing (store the last cert in full length and
recode older ones as time-backwards-diffs)]. I have to read and process
all the certs anyway.

The reason I need older certs as well is to enable the correct rooting
of branches (once supported).

    Christof

PS: Ever thought about putting an index on revision_certs.id? Perhaps
this speeds up correctness verification (just guessing) and since data
retrieval is more likely than data modification I cannot see drawbacks.
Similar might apply to other large (number of rows) tables as well.
[e.g. manifest_deltas.id, file_deltas.id]

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]