[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] Re: archive storage format comments on the size

From: Andrea Arcangeli
Subject: [Gnu-arch-users] Re: archive storage format comments on the size
Date: Tue, 30 Sep 2003 20:18:25 +0200
User-agent: Mutt/1.4.1i

Hi Pau,

On Tue, Sep 30, 2003 at 07:09:38PM +0200, Pau Aliagas wrote:
> On Tue, 30 Sep 2003, Andrea Arcangeli wrote:
> > On Tue, Sep 30, 2003 at 11:41:34AM -0400, Miles Bader wrote:
> > > None the less, avoiding the overhead of applying lots of little 
> > > changesets is
> > > still often desirable, which is the reason for the other thread on summary
> > > deltas.
> > 
> > agreed, but I prefer the cache to happen locally or I would lose
> > information during network transfer.
> That's not right. You don't lose any information using or not cached 
> revision or cached libraries:
> -cached revisions
>  * are stored in the archive
>  * can be added (tla cacherev) or deleted (tla uncacherev)
>  * are not transferred in mirrors or gets of your archive

and in turn the transfer will be very slow (I need to transfer all the
granular patchsets, if there are 13000 patchsets I need all of them, or
cvsps -f will never be able to reasonably work over the network).

So either I lose info (tla archive-mirror --cached-tags) or I get
the info very very innefficiently from the dumb server (tla
archive-mirror --no-cache).

I need the info with the --no-cache to use a cvsps equivalent
functionality. I've to ignore the remote cached revisions. That's why I
said cachedrevs only work locally for me. And as you said locally I want
to use revision libs anyways.

So I don't see myself using cachedrev anytime soon, remotely I can't use
them because I need all the 13000 changesets, locally hardlinked
revision libs are an order of magnitude faster and they may take less
space too.

superpatchsets will solve the network transfer problem, the dumb server
will send me a single superpatchset that includes the last 10000
changesets compressed in bzip2 with something more than a x20
compression. superpatchsets will compact the size of the archive as well
(potentially lowering it close to the 450M of cvs linux-2.5 uncompressed)
Finally superpatchsets will be uncompressed all at once in /dev/shm
optionally, and they will be applied much faster than the current
tiny patchsets in .tar.gz format so the _first_ checkout will be
improved too. Then after the first checkout I just generate the revision
lib starting from changeset 12000, so I've only 100 patchsets to apply
during a checkout (the cost of hardlinking the revision tree is 1 second
or similar).

cvsps-like searches as well will be improved by unpacking the
superpatchset (or the superpatchsets) into /dev/shm and then searching
into them in plaintext without multiple tar.gz decompression in between.

I don't think this is similar to a cachedrev, nor a cachedrev can
provide an equivalent functionality.

The decision of creating a superpatchset should be based on these
properties of a group of patchsets:

1) when we access the patchsets in the superpatchset likely we want to
   access all of them and not just one (i.e. checkout/cvsps -f)
2) they're not really used frequently anyways
3) they're not used for merging updates, if an update should happen
   in middle of a superpatchset, it has to be splitted first

An derivative approch could be to extend the concept to basically always
merge 10 normal patchsets into 1 superpatchset. And maybe we can add
merging support too later. But those are features we can consider

this won't alter functionality at all (modulo merging), it's only a
chance in the format of the database, tha should boost arch in handling
an huge number of patchsets. I've already demonstrated in practice the
ratio compressions we will gain with a 2000 changeset archive.

Andrea - If you prefer relying on open source software, check these links:

reply via email to

[Prev in Thread] Current Thread [Next in Thread]