[Monotone-devel] Re: Support for binary files, scalability and Windows p

monotone-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: Support for binary files, scalability and Windows p

From:	graydon hoare
Subject:	[Monotone-devel] Re: Support for binary files, scalability and Windows port
Date:	Mon, 19 Jan 2004 02:27:28 -0500
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6b) Gecko/20031205 Thunderbird/0.4

Asger Kunuk Alstrup wrote:

Anyway, if you are considering to change the fundamental data-structure to a
tree of hashes or something else, please consider to have two different "hash"
functions if that at all makes sense: One that is to be used when you need a
unique identifier to identify a "version" (which I would suggest should just be
a timestamp along with a random string of bits), and another when you need a
short extract that can be used to find data. That would, you could optimise the
data structure for large file support: Only use linear time scans over the data
when you really have to.

no, I cannot do this. it adds too much fragility. monotone is adistributed system with no lock-step synchronization; this means thatyou and I can perform actions in parallel (adding the same file, mergingtwo trees into a third) and make assertions about those actions whichwill be meaningful when exchanged with a third party.

what's good about identifying things by content hash is that you and Iwill always construct the *same* content hash for a given object, evenwhen we are not explicitly communicating. we might be hours, days, yearsaway from reconciling our work, yet we chose the same identifiers. if Imake some other UUID which I bind to a content hash by attribution, Ineed to hold the SHA1<->UUID mapping tables for all the people everyinvolved in my VC system: that set of mappings becomes a critical pieceof indirection, and if that indirection breaks or is corrupted thesystem falls apart.

I know this means that monotone will be limited by the speed of hashingdata, and that this will hurt if you're using it for storing video. as Isaid, you're welcome to work out a different way to identify data basedon its content (if you like, put your own favourite UUID in metadatatags inside the file, and extract them on the fly). but I'm not going toadd extra code paths to manage another level of indirection for this case.

(or, of course, being free software you are always free to make aderivative work of monotone; I'm just not committing my own time to it)

I still think that the current hash-approach has another downside: In source
control, you often need to revert a file to a previous state. This will result
in the same hash for the file, although it is technically not the same.
Therefore, I would prefer a random string of bits and a time-stamp to identify
versions, in order to avoid these collisions, and in order to avoid linear time
scans of the files.

I understand your concern here, but I cannot currently suppress mydesire to keep identifiers in a (probabilistic) bijection with data. myfeeling is that reverted files *are* technically the same as theirprevious state: the historical story about "this file reverted from thisother file" is an external, attributed declaration. it is not anintrinsic aspect of the file, nor is the timesamp+random id of thefile's creation.


of course, reasonable people may disagree.

-graydon

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, (continued)

Prev by Date: [Monotone-devel] Re: Support for binary files, scalability and Windows port
Next by Date: [Monotone-devel] Re: Support for binary files, scalability and Windows port
Previous by thread: [Monotone-devel] Re: Support for binary files, scalability and Windows port
Next by thread: [Monotone-devel] RE: Support for binary files, scalability and Windows port
Index(es):
- Date
- Thread