monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: cvssync (was Re: [Monotone-devel] Re: big repositories inconvenience


From: Daniel Carosone
Subject: Re: cvssync (was Re: [Monotone-devel] Re: big repositories inconveniences (partial pull?))
Date: Fri, 8 Sep 2006 21:12:14 +1000
User-agent: Mutt/1.5.13 (2006-08-11)

On Fri, Sep 08, 2006 at 11:25:38AM +0200, Markus Schiltknecht wrote:
> To understand how certs are stored, I took a look at schema.sql and found:
> 
> CREATE TABLE revision_certs
> (
>   hash not null unique,   -- hash of remaining fields separated by ":"
>   id not null,            -- joins with revisions.id
>   name not null,          -- opaque string chosen by user
>   value not null,         -- opaque blob
>   keypair not null,       -- joins with public_keys.id
>   signature not null,     -- RSA/SHA1 signature of "address@hidden:val]"
>   unique(name, id, value, keypair, signature)
> );
> 
> Now, I understand most of it, only what are 'remaining fields'? 

Literally, the rest of the fields: id, name, etc.  They're
concatenated together as address@hidden:val] (they *should* be basic_io, and
will be after the next iteration, see the CertCleanup wiki page), and
the hash of this string is what gets stored as hash, and then signed
and stored as signature.

> How about only using compression? (Or is the cert value already compressed?)

It's not, because we often want to search on it, eg, branch name.
Apart from changelogs, the strings are also short: the sha1 hash is
usually the largest component.

> To get humble and more real now: is this an issue at all? (Except for 
> CVS revision info which should better be stored at other places.) If 
> not, at least I understand monotone better, now ;-)

Not really.  We could save some space with a little extra sql, such as
storing names as short foreign keys to an adjunct table of known cert
names rather than repeating the name string every time, but this isn't
really a *major* issue.  It's also entirely contained within the
storage layer, so we can change it any time without needing anything
more than a 'db migrate'.
  
> Another thought I had was using some sort of 'inverted indexes' to store 
> 'flag-certs' (which don't have a value, but are boolean in the sense 
> that attached = true, missing = false), i.e.:
> 
> flag cert 'PUSHED' is attached to revisions A, C, D and E,
> flag cert 'COMPILES_CLEANLY' is attached to revisions A, B, C and E

Essentially, SQL indexes perform this function for us; this is why the
certs are stored in 'expanded' sql tabular form, rather than
serialised string form (as some other data are, usually compressed).
 
> Or even better: use the manifest to store that information... AFAICT 
> manifests are delta compressed and store the filenames and file 
> revisions anyway. Why not store 'origin VCS' information from imports 
> there? Per revision that would be, looks like a much better fit for 
> other VCS like svn and git, too. I.e:

That's exactly how attr's are stored, which is why we propose to use
them for this..

So, good ideas, and luckily they're already in use! :)

--
Dan.

Attachment: pgp7pTdc3OPey.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]