[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Re: situations where cached revisions are not so go

From: Jason McCarty
Subject: Re: [Gnu-arch-users] Re: situations where cached revisions are not so good
Date: Sat, 27 Sep 2003 02:08:56 -0400
User-agent: Mutt/1.5.4i

Miles Bader wrote:
> Jason McCarty <address@hidden> writes:
> > I'm not sure anymore that summary-deltas would be a significant
> > improvement over downloading all the revisions needed after a cachedrev.
> For emacs they would be a _drastic_ improvement for people that already
> have any kind of pristine/library entry.  A cached rev for emacs is
> 20MB, a typical summary delta would be a couple hundred KB (if that).

Well, I don't doubt that they would take less space than cachedrevs, but
I'm wondering how much better they would be than just downloading and
applying the revisions that they summarize. Cachedrevs vs. summary
deltas may be a time/space tradeoff too.

> Note that when I think of a summary delta, I'm thinking of something
> like:
>     base-0      Cached-Rev, 20MB
>     patch-1     Patch, 10KB
>     ...
>     patch-49    Patch, 20KB
>     patch-50    Patch, 10KB
>                 Cached-Summary-Delta-0-50, 150KB
>                 Cached-Rev, 20MB

Are these real numbers, or made-up? I looked at tla--devo--1.1 (the only
semi-large tree I feel like downloading), and cachedrevs do indeed use a
lot of extra space (in the case of tla, ~100 revisions worth, more than
half the cumulative revision size).

Well, it's many hours later and I've done a little benchmarking. I don't
want to report my results until I reach a conclusion about a good course
of action, but initial tests are actually very promising. Summary deltas
taken every 200KiB in tla--devo--1.1 are on average half as big as the
cumulative changesets they span. CPU usage is much lower than applying
each revision individually, by close to two orders of magnitude (I
wonder if this is a bug in tla).

> Of course it would be nice if there was algorithm to pick the optimal
> set of things to do based on existing trees/bandwidth/etc., but
> probably heuristics could do alright too, at least better than the
> current state...

I'll try to look at that tomorrow, after I do some more benchmarking :-)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]