[Monotone-devel] Re: Support for binary files, scalability and Windows p

monotone-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: Support for binary files, scalability and Windows p

From:	graydon hoare
Subject:	[Monotone-devel] Re: Support for binary files, scalability and Windows port
Date:	Tue, 20 Jan 2004 13:25:18 -0500
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6b) Gecko/20031205 Thunderbird/0.4

Asger Kunuk Ottar Alstrup wrote:

In order to represent this accurately, in the face of distributed use, I
think you need to represent every single change as an edge in the graph
somehow, and the order in which they happened. In other words, you
effectively have to record an ordering of your back-edges or
cancellation edges.

well, in a sense you're right. I don't want to beleaguer the point toomuch, except to point out again that the DAG is kept over *manifest*versions, not file versions. so for example if you add even 1 bit to aChangeLog file on each revision, the ChangeLog SHA1 changes, and themanifest ID changes, and I have distinct nodes in my graph again.

granted, this is a bit of a cheap hack; it's just (a) simple and (b) inthe hands of the user. if they want to incorporate the date and time ofthe last revision -- or a UUID for that matter -- into the notion of a"version", it's as simple as making sure it shows up in an easily-mergedfile somewhere in the manifest.

it's a simple model, and simplicity is important to me: if monotone'smodel of something grows too complex, my reasoning about the model getsweak and error-prone, not to mention it becomes harder to explain themodel to users. since users like to consider version control "verypermanent and safe", it's important for them to understand what it'sdoing beneath the covers, at least in general.

That is a good proposal, and that might work for video files. I think I
need to give you a little background of where I am coming from.

> ...

ahh, here is the juicy part. your needs are clearly formidable, and youare willing to dedicate some effort to solving them. fair enough. let mesplit what I see as your requirements into 3 sections:


 - the need to mark some files as "opaque", in the sense that they are
   not necessarily scanned for common substructure with their own past
   versions or neighbours, not gzipped, not merged.

 - the need to support very large files: overcoming the 16mb limit in
   the database, and removing any cases in which files are loaded into
   memory in their entirety.

 - the need, possibly, to change the way files are identified for one
   of two reasons: hashing takes too long, and (possibly) there are
   unacceptable failure cases in history graphs built from hashes.

I can imagine handling "opaqueness" with a hook: call the hook with apathname (or other identifier), and if it returns true, monotone alwaysstores and sends complete versions (no xdelta or similar-block scanning)and doesn't bother gzipping. not too much effort to implement. we'd needto locate all the places we make assumptions about gzip and xdelta, andpredicate them on the hook.

I think we're on our way to supporting large files. breaking the 16mbbarrier is probably the easy part since it can be confined to thestorage system. if we're going to a block-collection model for storageanyways, that would buy you 16mb of block commands. say each blockcommand is 128 bits, then you can fit a million of those in an existing16mb fragment, so you might be able to store say files of 16tb in size.

removing all the places where we assume we can load a file into memorymight be hard, might not be. if you can live with loading the "top" itemin a file -- the up-to-16mb block-command list -- into memory all atonce, we only need to change places where we "reach inside" that data,rather than all possible references. or, if even that is too expensive,we could possibly make the data object lazy, so that it keeps a smallmemory cache of its own sections, and loads/flushes them on demand.complex, but doable.

finally, the change of identifier type: again, I am wary of theindirection-table approach, so I am trying to consider alternatives. Ithink this could be done with a hook. the calculate_identifier() callscould be changed to depend on a hook which optionally picks somenon-SHA1 way of calculating identifiers. then if you have something elsein mind you can use it. it would require all the users of a givenproject to have that hook installed, but otherwise monotone would becompletely ignorant of your chosen strategy.

would this set of changes satisfy your needs? I would be happy toaccomodate these as they are mostly hidden from smaller-scale users, andcan be described in "advanced use" sections of the manual. they are thesort of compromise unlikely to cause mainstream breakage or unnecessarymultiplication of ideas in the easy cases.


-graydon

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, (continued)

Prev by Date: [Monotone-devel] RE: Support for binary files, scalability and Windows port
Next by Date: [Monotone-devel] RE: Support for binary files, scalability and Windows port
Previous by thread: [Monotone-devel] RE: Support for binary files, scalability and Windows port
Next by thread: [Monotone-devel] RE: Support for binary files, scalability and Windows port
Index(es):
- Date
- Thread