gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] [BUG] FEATURE PLANS: "perfect" summary deltas


From: Aaron Bentley
Subject: Re: [Gnu-arch-users] [BUG] FEATURE PLANS: "perfect" summary deltas
Date: Sat, 10 Jul 2004 14:08:22 -0400
User-agent: Mozilla Thunderbird 0.5 (X11/20040306)

Tom Lord wrote:

   aaron's alternative idea (the "separate delta directory") also aims
   to minimize roundtrips: just get a listing of that one directory
   and now you know a big chunk of the graph.  First: I think it would
   interact poorly with smart servers because a smart server may be
   willing to offer up a complete graph of deltas (so, while we only
   have one roundtrip, the bandwidth can start to look pretty
interesting in large versions).

The archive.h-level stuff needs to be of the form "what are you willing to give me that can get me from here to here." A smart server needs to decide at that point whether it's willing to construct a given delta. If not, it can reply with anything relevent to the request. If it is willing to provide the delta, it answers with a succinct "I can give you exactly what you need".

Second: I think it would interact
   poorly with smart servers because it requires smart servers to
   eagerly describe what deltas are available rather than seeing if,
   upon demand for a specific delta, it's handy to provide it.

It is necessary for the smart server to decide at query time (or earlier) whether it can provide a given delta. A builder needs that information to determine the best path.

3. Do we or do we not muck with the archive format?

   Actually, I shouldn't say "archive format".

   Do we or do we not muck with the archive _abstraction_ because,
   with a few exceptions (like checksums and signatures) when we
   change the archive format, we're implying a change to the archive
   abstraction.

   Without some fancy footwork, some of Abentley's ideas would change
   the archive abstraction in some deep ways that impact things such
as what smart servers can do. I see a negative impact in the particulars.

Yeah, and the way I see it, you're trying to change the archive abstraction without changing the archive abstraction, which means the builder has to do the generalization instead (deleting log files, for example).


   The "perfect" summary delta doesn't change the archive abstraction
at all.

The way I see it, you've distorted the meaning of a version in order to use it for storing semi-arbitrary deltas. It's like shoving utf-8 through an interface that was designed for ASCII-- its ugly and harder to work with than the true representation.


A. I dropped "commit --base" (aka "commit --tag") in tla and
   therefore, the current builder knows nothing about it.

Yeah, but on the other hand, the archive format doesn't distinguish commit --base from a tag commit. Tag is just commit --base with no tree changes.

C. Your experiment is great work and very comforting.

   Something that is absolutely not a priority but that might help
   shed some light at some point is an empirically supported
   characterization of "typical" changerates and change-natures and
   their determining variables, combined with analysis about how
   effective "perfect" summaries (or any alternative) is predicted to
   be.  (For now, this not being rocket science, and especially given
   checks such as yours, I trust my intuition filtered through
   feedback from others thinking about the same topic.)

pyaba can be used to determine the path of a revision changeset, so it may be helpful here.

$pyaba revision --patch address@hidden/tlasrc--integration--1.3--patch-5
/mnt/eagerbeavershare/arch/storage/tlasrc/tlasrc/tlasrc--integration/tlasrc--integration--1.3/patch-5/tlasrc--integration--1.3--patch-5.patches.tar.gz

Oh, and remember how I wanted to be able to calculate specific deltas?

Even if we just needed the base-0 to patch-(2^x-1) revisions, we'd get this:

address@hidden:~$ du delta*.tar.gz -s --total -h
12K     delta-base-0--patch-1.tar.gz
36K     delta-base-0--patch-3.tar.gz
36K     delta-base-0--patch-7.tar.gz
40K     delta-base-0--patch-15.tar.gz
48K     delta-base-0--patch-31.tar.gz
160K    delta-base-0--patch-63.tar.gz
332K    total

But if we had base-0 and wanted patch-63, we'd just need the last one. Which is about half the size of the aggregate summary delta size. So the arbitrary delta approach is more space-efficient than summary deltas.

Meahwhile, the aggregate size (according to du -s --total -h) of the simple revisions from base-0 to patch-63 is 392 K. (using --apparent-size, it's actually smaller, but I'm talking about storage requirements)

Aaron




reply via email to

[Prev in Thread] Current Thread [Next in Thread]