[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: Support for binary files, scalability and Windows p

From: Asger Kunuk Ottar Alstrup
Subject: [Monotone-devel] Re: Support for binary files, scalability and Windows port
Date: Thu, 15 Jan 2004 09:39:13 +0100

>> Secondly, does monotone scale well? Does it handle files bigger than
>> 50MB, and can it handle archives that are several GB big?
> not yet to #1 (current limit on files is 2^24 bytes = 16mb), yes to #2
> (current database limit is 2^41 bytes, I think). there are also some
> scalability issues with very large history graphs (gcc's entire
> history can be imported, but operations are slow) and very large file
> trees (sha1 hashing every file in a multi-gigabyte tree takes a
> while). 

OK. The 16 MB limit is a showstopper for us, and probably the hashing as
well.  We are working with raw video files that are on the order of 100
MB big.

Do you have an impression of what the consequences of changing some
files to use automatically generated unique identifiers instead of
hashes, or partial hashes? In other words, how much of the code relies
on the fact that the id's are hashes of the complete file? 

I understand that we would probably loose some nice properties like
cryptographic security that files are identical, but that can be
acceptable for us: If a video file is changed, it is changed a lot. We
do not need to make sure that every bit is identical. If it changes, the
file size will tell in 90% of the cases, and if you make a partial hash
of the first 100 Kb and the last 100 Kb of the file, combined with a
time-stamp check, it is enough for our purposes.

>    - support binary files on CVS import: 1 person-week. probably
>      involves some small changes to and,
>      nothing major, but those files are a little hairy.

OK, that seems managable. From reading the rcsfile manual and the source
code for, I'm not sure what changes it requires. It seems to
me that the parsing should work just fine with binary files as it is,
and I'm not sure what else needs changing. Do you know of any problems
in this file?

Regarding, that is too big for a 5 minute inspection, so
do you know in more detail what changes are required?

Maybe the best bet is simply to try, and see what breaks?

>    - support ~2^32 byte files (rather than current 2^24): 1
>      person-week. all work is in sqlite/*, and is plain ANSI C. I
>      have a partial patch sitting around which gets about half of
>      this done, but there are some cases in the database virtual
>      machine which aren't handled, and I'd want to talk to the sqlite
>      author about it to be sure, maybe have it folded in upstream.

Regarding big-file support: I was wondering whether that could be done
without changing sqlite: A big file can be defined as concatenations of
many smaller chunks. Even if you bump the limit to 2^32, it will still
fail for some users, so maybe it's better to come up with a scheme
without such a "low" limit?
I'm not sure how that fits into the architecture, though.

Other areas where big, binary files are challenging, besides the storing
and hashing issue, is obviously distribution. When you have big binary
files, you only want to send the bits you really have to send, and you
want to go to great length to only send them once, in light of unstable
connections, low bandwidth and stuff like that. So, this requires an
efficiency network protocol, that supports distribution in chunks.

How is monotone doing in this area? Full support requires reimplementing
rsync, but maybe monotone can reuse rsync or something?

>    - port to windows: 1-4 person-weeks. could be as simple as doing a
>      configure with a cygwin toolchain, could be uglier if either you
>      don't like cygwin, or cygwin doesn't work, or there are windows
>      peculiarities I haven't insulated against.

It seems Zbynek Winkler more or less nailed that one today, so that is
good news.

The next step would be to develop a TortoiseCVS/TortoiseSVN kind of
client, but that should not require any changes in monotone as such.

Asger Ottar Alstrup

reply via email to

[Prev in Thread] Current Thread [Next in Thread]