gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] archive storage format comments on the size


From: Miles Bader
Subject: Re: [Gnu-arch-users] archive storage format comments on the size
Date: Mon, 29 Sep 2003 19:10:43 -0400
User-agent: Mutt/1.3.28i

On Tue, Sep 30, 2003 at 12:39:24AM +0200, Andrea Arcangeli wrote:
> >    The space inefficiences in arch are that it adds: contents of
> >    deleted lines and files, context of diffs, an extra copy of 
> >    the log file, and some overhead costs associated with using
> 
> why can't the not strictly needed stuff be removed? We know the
> patchsets can't reject during checkout, why should we carry all this
> overhead with us when that can be deduced at runtime?

The question of course, is `what's not needed?'

Even if you're only talking about a change to a single file, a delta in a CVS
file and an arch changeset are rather different things: the CVS delta can
generally only be applied in the strict context in which it was generated, it
is almost useless in any other context.  An arch changeset, OTOH, is useful
in many contexts (and this isn't just a theoretical advantage either, many
merging scenarios have you applying changesets in a different context from
which they were created).  The current arch changeset format is optimized for
this sort of flexible usage, instead of for raw storage efficiency.  I like
to think of arch as being like the traditional `trading patches' style of
development, except with all the record-keeping taken care of for you.

I suppose the `extra' info could be deduced somehow, but that obviously adds
additional overhead.  I think that would be especially noticable with the
current `dumb server' network model of arch -- any excessive trawling around
in a remote archive kills you due to the network latency; presumably a `smart
server` could use different tradeoffs.

> you can sure solve problems by throwing money into the hardware, these
> days storage is exceptionally cheap than it has ever been, but I don't
> normally take it as a good argument while developing software

Yeah, sometimes it drives me nuts when Tom uses that argument -- if I had
excess disk space (I don't!) I'd rather use it to store more _source_, unless
the inefficiency buys me something.  In the case of arch changesets,
umm.... I'd say the increase flexibility is worth it (a smart server may be
the way to go for optimizing disk space in the future, without losing
flexibility), but e.g. in the case of .arch-ids/*.id files, I don't think I
gain enough to offset the overhead.

> I understand you have to stat all files in the tree (I don't want to tla
> edit), but I don't see why you've to stat all the internal _patchsets_
> metadata inside the {arch} directory. I just don't see that.

I think it's just that for arch, {arch} is _not_ a special case for most
operations -- it's just treated as part of the source tree when
making/applying changesets etc.  I think this is _very_ clever, in that it
simplifies the implementation greatly by not requiring tons of special cases
to handle arch meta-data.  Perhaps there are optimizations could be done
based on knowledge of the structure of {arch}, but I think that's something
that requires careful thought, as I don't think you want to change the the
_semantics_ of {arch] at all.

> Is diff -u0 one of the ideas that will be implemented?

Are you _really_ sure that's what you want?  It would be very dangerous
(for the same reason that -u0 patches are dangerous in general)...

I don't know, maybe there could be some sort of `archive crunch' operation
that when through an archive and reduced the amount of context information in
changesets, making them applyable only in the strict context of their branch
(and of course some note should be made of this so that tla would refuse to
do otherwise)...

-miles
-- 
If you can't beat them, arrange to have them beaten.  [George Carlin]




reply via email to

[Prev in Thread] Current Thread [Next in Thread]