monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Scalability question


From: Justin Patrin
Subject: Re: [Monotone-devel] Scalability question
Date: Fri, 4 Aug 2006 15:26:02 -0700

On 8/4/06, Timothy Brownawell <address@hidden> wrote:
On 8/4/06, Jonathan S. Shapiro <address@hidden> wrote:
> If I understand the documents correctly, there are a whole lot of places
> in the monotone schema that are very similar to things we did in OpenCM.
> One of these bit us badly on scalability. I want to identify the issue,
> explain how it bit us, and ask whether it has been a problem in
> monotone. If not, why not?
>
> The Monotone "Manifest" is directly equivalent to the OpenCM "Change"
> object. We went through various iterations on our Change objects, and we
> hit two scalability issues. The first arises with very large projects.
> The second impacts initial checkout (in monotone, it would probably
> arise in push/pull rather than checkout).
>
> Like monotone, OpenCM does not store entries for directories; they are
> implicit in the file paths. In contrast to Monotone, OpenCM adds a level
> of indirection between our Change records and our Content objects. The
> intermediate object is called an Entity. It stores the (file-name,
> content-sha1) pair and a couple of other things that aren't important
> for this question.
>
> Consider a mid-sized project such as EROS, which has ~20,000 source
> files. [For calibration, OpenBSD is *much* larger]. This means 20,000
> sha-1's in the Manifest/Change. In OpenCM, these are stored in binary
> form, so each sha-1 occupies 20 bytes, and the resulting Change object
> is about 400 kilobytes.

Internally, we don't really use manifests (much) anymore. Instead we
use "rosters", which are private manifest-plus-merge-metadata objects.
We currently store them as plaintext, but have been considering
storing them as sets of database rows for performance reasons.

> This particular object sees a lot of delta computations, and simply
> reading and writing it takes a noticeable amount of time. Also, the need
> to sync a 400 kbyte object in order to begin a checkout is very
> disconcerting to users -- especially when you are doing it over a slow
> link at (e.g.) a hotel or (e.g.) a PPP link [Yes, a lot of people really
> still use dial-up).

We don't send manifests (or rosters) over the network. Instead we send
revisions, which include a list of changes (add, drop, rename, patch,
etc) againt the parent revision(s).

> I am interested to know if this has been a scalability issue in
> monotone? What performance result might I expect if I load EROS into
> monotone?

It would probably be kinda slow. I sorta recall that it's slow for
OpenEmbedded, but I think they're still using 0.25 (before our change
to using rosters instead of manifests internally), so more recent
versions might be less slow.


OpenEmbedded recently migrated to 0.27 (I'm using 0.28 myself).
Pull/push is much much faster. Update is faster but still fairly slow.
It takes it a while to select the branch to update against (I don't
know why this should take so long...) then takes a while to choose an
update target revision. The actual update doesn't take too long.

> If it *has* been a scalability issue, I have some hindsight suggestions
> to offer based on the OpenCM experiences, but I don't want to seem
> pushy.

We have seen some slowness, yes. Our current thinking is to store our
rosters as table rows. This lets us really store one as only the rows
that are different from its parent(s?), which will speed up
taking/applying deltas. It also save us from having to parse them
to/from the plaintext format as much. They don't cause large network
transfers, because they're not sent over the network.

Yes, suggestions are always welcome.

Tim


_______________________________________________
Monotone-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/monotone-devel



--
Justin Patrin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]