monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] partial pull #2 - gaps instead of a single horizon


From: Christian Ohler
Subject: Re: [Monotone-devel] partial pull #2 - gaps instead of a single horizon
Date: Fri, 01 Jun 2007 13:38:18 +0200

Nathaniel Smith, 2007-05-30:

In my world, the reason we need partial pull is that the total history
size of a project grows without bound.  Therefore, for very large and
old projects (Linux kernel, *BSD, Mozilla, gcc, glibc, maybe a few
others), the full history database may be many times larger than a
checkout.  It is unreasonable to expect new developers to, before
writing their first patch, download several gigabytes of data.

This looks like two separate issues to me:

(1) The total history size of a project in monotone grows without bound.
(2) The time it takes for a new developer to get a local workspace of a project is too high with monotone.

As far as I can tell, problem (1) on its own isn't affecting anyone right now -- even though there are a handful of projects in existence that would run into it should they ever convert their history to monotone. Problem (1) does imply problem (2) in theory, but the real reason typical projects have problem (2) right now is unrelated to problem (1). The reason is that mtn pull is too CPU-intensive and/or not doing proper pipelining.

The main reason problem (1) is being discussed seems to be that the proposals on how to solve it ("partial pull") will hopefully also make problem (2) less relevant. But there may be solutions to (2) ("initial pull") that are much simpler than partial pull. Simpler in that they don't force us to think about incomplete history.

In fact, what the Pidgin project is doing (download compressed mtn database snapshots over HTTP) is a solution to (2) that doesn't solve (1). Too bad mtn isn't smart enough to offer similar efficiency for this particular case. It's a special case, but it's the case that matters.

A complete pull of Pidgin's current database transfers 120 MB. Is this the size of history that we want to give up on and recommend partial pull for? That doesn't seem very satisfactory. It's nowhere near the several gigabytes of history that Nathaniel is calling an unreasonable size. It should be within the range that mtn pull can deal with. Partial pull would just be a workaround for mtn's inefficient pull mechanism.

Maybe it's just a matter of optimizing the roster manipulation code. Or maybe there's a way to avoid or defer some of the work that the code is currently doing during pull. Maybe there's a way to short-circuit the expensive roster manipulation and just copy node ids from the server (with some simple adjustments) if the local database does not contain any revisions connected to the subgraph being pulled?

Christian.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]