[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: Support for binary files, scalability and Windows p

From: graydon hoare
Subject: [Monotone-devel] Re: Support for binary files, scalability and Windows port
Date: Mon, 19 Jan 2004 11:42:10 -0500
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6b) Gecko/20031205 Thunderbird/0.4

Asger Kunuk Ottar Alstrup wrote:

So, in summary: I appreciate the time you are spending on monotone, and
I appreciate the friendly and open tone with which you approach this

hey, so long as we can keep the discussion in terms of "finding a satisfactory compromise" rather than "forcing one party to admit he's wrong", I'm sure we'll get along fine. I only mentionned a code fork as an afterthought, in the sense of "if you find that I am being intolerably unreasonable, there's always this route..." :)

This is because I do not think this use case has priority over the use
case where I revert a change, but another party does not.

ah, right. again, my background concerns are all about source code, in which the concern is to make merging robust and painless (well, and doing strong QA, which also benefits from primacy of hashed identifiers), so I very much feel that it takes priority.

in any case, reversion of a manifest can be represented as a back-edge in the ancestry version graph or a cancellation of the forward edge (and note: you'd have to revert the entire manifest, not just a file, because only manifests are chained together in a history graph).

So, my suggestion is to separate concerns: Identify each version of a
file with a truly unique identifier in the version DAG, and then have a
separate scheme for representing each version of a file in a compact and
efficient way.

hmm. I read this as an elaboration of what you had in mind already: making identifiers into semi-random UUIDs which are mapped to hashes, rather than equal to hashes. while logically *possible*, I still don't see any to do this without the associated costs:

   - all operations on an "id" (comparison, i/o, synchronization) go
     via an indirection table which associates hashes with UUIDs

   - writing code to construct this table, synchronize it, evaluate its
     trust, etc. and modifying all operations to use it is a
     considerable amount of work

   - this table is a new point of failure in the system, and a new
     vector for attacks which play with trust relationships

   - the intrinsic integrity-checking associated with frequent hashing
     is lost

these, to me, are heavy costs. let me present an alternative which, from my perspective, shifts the workload to the user who has this (imo unusual) need to version-control large video files without ever hashing or merging them:

     do version control on directories full of small text files which
     contain nothing but the UUID of a video file, or better yet a URL.
     make a persistent attribute in .mt-attrs which treats each file as
     a request to have the associated large file transferred from a
     video server via wget and stored in a bucket of video files in
     ~/.huge-video-files, then symlinked into place in your
     configuration tree.

in this "solution" to the problem, monotone is not changed at all. it is doing what it was built to do. you can still use it to manage configurations of your data, evaluate the trustworthyness of various configurations, checkpoint and restore them, trade them with friends, etc. but you have decided that your data are *so big* that loading them into memory and hashing them are too high costs (not to mention doing common subsequence searches on the bits during storage, or pointlessly gzipping them in-memory, or doing all transmission via base64), and the benefit of being able to merge video files is an insufficient benefit. so you moved the files themselves out of monotone's storage management.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]