[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implement
From: |
Aaron Bentley |
Subject: |
Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes |
Date: |
Sat, 31 Jan 2004 17:40:04 -0500 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6b) Gecko/20031205 Thunderbird/0.4 |
Tom Lord wrote:
Excellent. So we have been using phrases like "streaming the archive
protocol" in different ways. And what you suggest there is roughly
what I propose as well. Slight difference: I think that pfs_get_files
should be asynchronous and let you retrieve the files one at a time
How would that work? You'd say give me these files, then return when
the first one's ready?
You'd get background processing, and the behavior would get less
deterministic. I thought this was specifically what you didn't want! I
must be confused.
so that client-side space requirements don't have to be the sum of the
file sizes.
According to the thread, the aggregate storage requirement will be
roughly twice that of the equivalent DELTA. I'd be willing to make the
sacrifice of storing that data temporarily to retain the appearance of a
single-threaded program.
If you're really serious about this, I guess you could use callbacks to
compose and delete the changesets as they come in, but it looks hard to
me. What if the first changeset to complete retrieval is the one in
the middle?
2) _if_ a server can compute and return that delta directly,
that completely solves our network latency and bandwidth
issues. It's exactly one roundtrip and the changeset returned
is as compressed as it could possibly be.
There are ways of composing a changeset that would not be as
space-efficient as others. Interdiff can merge two diffs, but I believe
you can also simply concatenate them.
The caveat is, of course server-side costs but that's a
worthwhile trade off in a number of circumstances.
3) _even_if_ a server can _not_ compute and return that delta
directly, so long as the /underlying transport/ is streamy,
if the archive protocol asks for that entire delta at once,
the N patch fetches can be streamed and the results composed
client-side.
I believe all supported protocols can be treated as streamy. e.g.
pfs_get_files can "stream" on the native filesystem using fork(). In
theory, that provides an opportunity for the file system scheduler to
optomize its behaviour.
That does nothing for the bandwidth problem,
but it helps a lot with the latency problem and perfectly solves
the client-side inefficiency.
There's a caveat, of course: we realy only win here if composing
the incoming patches can be done fast enough to keep up with
them as they stream in. Still -- that's a lot more likely to
be the case than that we can make apply_changeset fast enough to
keep up.
Oh, so compose_changeset is performed on a pair of changesets, not a
sequence? I imagined an operation that would merge a series of
changesets into a single one. For that, we'd have to wait for all
downloads to complete.
So I think the thing to do is to modify the archive protocol --
libarch/archive.h to add a `delta' command.
Good.
Initially, achive-pfs.c (which implements the protocol for all
transports except arch-archd) can do nothing more than compose
incoming patches -- that will address the client-side inefficiency.
Later, the libarch/pfs.h vtable --- the list of functions which
dumb-fs transports have to provide -- can be extended to include an
asynchronous "multi-file fetch" command. And archive-pfs.c can use
that to stream the requests for the individual changesets.
That will address the latency issues.
Yes, I agree.
(The order of those two steps is variable. If multi-file fetch
shows up first, archive-pfs.c can take advantage of it in other
ways.
Better do it in the order you suggested, though.
if the proposed archive-cachedelta
command were added, then archive-pfs.c could take advantage of that
in `delta' in order to help with bandwidth issues.
I don't understand this.
Meanwhile: The arch-archd protocol can support the DELTA protocol
directly. arch-archd doesn't work via the pfs layer. It doesn't
and shouldn't translate archive.h protocol into dumb fs requests.
The simplest thing is to implement delta directly.
Dead on.
In theory, tla could use an internal archd all the time via a pipe.
That would make the archd easier to develop and would reduce the number
of codepaths. But the end result would be less efficient in the local
case. (And just how many abstractions would that be: archd, pfs, vu...)
Initially, an archd server can reply to DELTA just by providing a
list of all the changesets -- the client side of archd can compose
those. That solves latency and client-side inefficiency.
Later, perhaps, archd servers can have a "plugin" which can
sometimes compose some changesets server-side. That solves
bandwidth.
Yes, or see my reply to Jason about proxies.
--------------------------------
About backwards building:
Even within a single archive there's choices such as
when to use an archive cached rev vs. when to build-by-patching.
For example, untar'ing a cacherev from a local-fs archive is generally
faster than copying one from a library. An untar plus a couple of
changeset applications may be preferable to copying a revision already
present in a library.
Yes. It gets to a place where good estimates depend on
1. local hardware
2. throughput to archive
3. availability of cacherevs
4. presence of local revisions
We could tweak that for years. (Have I missed any?)
> Looks like. 5x to 25x speedups and half the size for "summary deltas"
> of 200 KiB does sound worthwile. My concern is that the speedup on the
> client side might be produced by pushing that CPU cost onto the
> server.
A few things: summary deltas don't _have_ to computed server-side.
Part of the beauty of the idea is that it works out fine even on
dumb-fs servers if users can manage them explicitly (just like
cachedrevs).
> Do we know that composing changesets will be significantly cheaper? If
> they are, we should be doing it client-side too.
We _don't_ in an empirical sense know that composing changesets will
be significantly cheaper however I think it's a safe bet. Changeset
size is (roughly, and only in the expected case) a function of the
amount of human work that goes into one commit -- relatively constant
across projects. Tree-size is arbitrary. Faced with some
O(tree-size) algorithms that can be avoided by changeset composition
there's two strategies: try to eliminate the O(tree-size) steps from
the algorithms or use changeset composition. You and others have
done a lot of work so far on eliminating O(tree-size) steps (and
that's very good) --- the other path wants exploring, too.
It will be a little tricky.
> - may, depending on implementation, require users to install
> interdiff
It will add new code for diff composition to tla but I don't see any
good reason to add a new dependency.
So do you propose simple concatenation of diffs? Or do we absorb
interdiff into arch? We could write our own, but I don't see why.
> - looks about as complex as those optomizations
I'm not sure I agree, just because patchset composition is more like a
math problem and less like a "what weird things can happen on the
filesystem" problem.
Also, you left out that compose_changeset is useful server side and
for `changeset-utils'.
I know not this "changeset-utils" of which you speak. Would this
include standalone tools for creating and applying changesets?
Aaron
- Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes, (continued)
- Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes, Tom Lord, 2004/01/30
- Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes, Colin Walters, 2004/01/30
- Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes, Tom Lord, 2004/01/30
- Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes, Aaron Bentley, 2004/01/30
- Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes, Colin Walters, 2004/01/30
- Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes, Aaron Bentley, 2004/01/31
- Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes, Tom Lord, 2004/01/30
- Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes, Aaron Bentley, 2004/01/31
- Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes, Tom Lord, 2004/01/31
- Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes, Andrew Suffield, 2004/01/31
- Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes,
Aaron Bentley <=
- Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes, Tom Lord, 2004/01/31
- Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes, Jason McCarty, 2004/01/31
- Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes, Aaron Bentley, 2004/01/31
- Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes, Aaron Bentley, 2004/01/31
- Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes, Tom Lord, 2004/01/31
Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes, Robert Collins, 2004/01/30
Re: [Gnu-arch-users] RFC: arch protocol, smart server, and tla implementation prototypes, Bruce Stephens, 2004/01/30
Re: [Gnu-arch-users] RFC: arch protocol, smart server,, Mirian Crzig Lennox, 2004/01/30