Re: [Monotone-devel] partial pull #2 - gaps instead of a single horizon

monotone-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] partial pull #2 - gaps instead of a single horizon

From:	Christian Ohler
Subject:	Re: [Monotone-devel] partial pull #2 - gaps instead of a single horizon
Date:	Fri, 01 Jun 2007 13:38:18 +0200

Nathaniel Smith, 2007-05-30:

In my world, the reason we need partial pull is that the total history
size of a project grows without bound.  Therefore, for very large and
old projects (Linux kernel, *BSD, Mozilla, gcc, glibc, maybe a few
others), the full history database may be many times larger than a
checkout.  It is unreasonable to expect new developers to, before
writing their first patch, download several gigabytes of data.


This looks like two separate issues to me:

(1) The total history size of a project in monotone grows without bound.

(2) The time it takes for a new developer to get a local workspace of aproject is too high with monotone.

As far as I can tell, problem (1) on its own isn't affecting anyoneright now -- even though there are a handful of projects in existencethat would run into it should they ever convert their history tomonotone. Problem (1) does imply problem (2) in theory, but the realreason typical projects have problem (2) right now is unrelated toproblem (1). The reason is that mtn pull is too CPU-intensive and/ornot doing proper pipelining.

The main reason problem (1) is being discussed seems to be that theproposals on how to solve it ("partial pull") will hopefully also makeproblem (2) less relevant. But there may be solutions to (2) ("initialpull") that are much simpler than partial pull. Simpler in that theydon't force us to think about incomplete history.

In fact, what the Pidgin project is doing (download compressed mtndatabase snapshots over HTTP) is a solution to (2) that doesn't solve(1). Too bad mtn isn't smart enough to offer similar efficiency forthis particular case. It's a special case, but it's the case that matters.

A complete pull of Pidgin's current database transfers 120 MB. Is thisthe size of history that we want to give up on and recommend partialpull for? That doesn't seem very satisfactory. It's nowhere near theseveral gigabytes of history that Nathaniel is calling an unreasonablesize. It should be within the range that mtn pull can deal with.Partial pull would just be a workaround for mtn's inefficient pullmechanism.

Maybe it's just a matter of optimizing the roster manipulation code. Ormaybe there's a way to avoid or defer some of the work that the code iscurrently doing during pull. Maybe there's a way to short-circuit theexpensive roster manipulation and just copy node ids from the server(with some simple adjustments) if the local database does not containany revisions connected to the subgraph being pulled?


Christian.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Monotone-devel] partial pull #2 - gaps instead of a single horizon, Christian Ohler <=
- Re: [Monotone-devel] partial pull #2 - gaps instead of a single horizon, Markus Schiltknecht, 2007/06/01
  - Re: [Monotone-devel] partial pull #2 - gaps instead of a single horizon, Rob Schoening, 2007/06/01
    - Re: [Monotone-devel] partial pull #2 - gaps instead of a single horizon, Markus Schiltknecht, 2007/06/02

Prev by Date: [Monotone-devel] Re: No read-only automate cert (aka no mtn2cl)
Next by Date: Re: [Monotone-devel] Re: No read-only automate cert (aka no mtn2cl)
Previous by thread: [Monotone-devel] No read-only automate cert (aka no mtn2cl)
Next by thread: Re: [Monotone-devel] partial pull #2 - gaps instead of a single horizon
Index(es):
- Date
- Thread