monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] RE: Support for binary files, scalability and Windows p


From: Asger Kunuk Ottar Alstrup
Subject: [Monotone-devel] RE: Support for binary files, scalability and Windows port
Date: Wed, 21 Jan 2004 09:16:03 +0100

> the DAG is kept over *manifest* versions, not file versions...

OK, I understand. That obviously reduces the risks of conflicts a lot
when we are talking source code - the toggling situation will not occur
often in that case. So, I think I have to agree with you: I would be
happy to take a wait-a-see approach to the history thing: If it turns
out that this is in fact a problem in real life, then deal with it at
that point. And the first thing to try is to introduce a policy that
people should keep a ChangeLog.

Your write-up of the three separate tasks is excellent:

>   - the need to mark some files as "opaque", in the sense that they

Yes, that would obviously be a good thing to be able to tune the system
by trading hard-disc space for speed. For starters, it is not a
show-stopper, since today CVS does not support this, and we get by with
CVS. (See below regarding hashes.)

>   - the need to support very large files

Of course, this is the real show-stopper for now. So, let's split this
into three separate things:

- The possibility to store very large files

- The efficient working of very large files

- The efficient distribution of very large files

Regarding the second item: Today, CVS loads the big files once into
memory. That works for us now, so we can live with the same in monotone.
(But three times in memory is not acceptable in the common use cases.)
Of course, it would be better to have a fixed cap on memory use, but for
starters, it is not a show-stopper.

Regarding the third item, I'm not sure how your proposal of 256-way
trees addresses this, but this is semi-important to us - given the fact
that we are saturating the lines.

>   - the need, possibly, to change the way files are identified for one
>     of two reasons: hashing takes too long, and (possibly) there are
>     unacceptable failure cases in history graphs built from hashes.

We should split this into two separate requirements:

- The need to reduce hashing time on big files

- The need to avoid failure cases in the history graphs

where we are ready to adopt a wait-and-see strategy for the second one.
For the first one, simply changing to a faster, but less accurate,
hashing might work for starters.

I can think of a few other things:

- The footprint on each persons machine should be a function of what
they have checked out - not a function of the size of the full
repository or of the full history. Like CVS.

The reason for this is that while we are ready to buy big discs for a
few people, it is a different matter if every person that uses the
system has to have 200 GB discs for this. It would take days to get
started ;-)

- Native port to Windows. It seems that the attempt to compile with
VS.NET failed for Zbynek. You already said you were ready to accept
patches for this.

- Pruning of the history based on an "expiration" duration. For some of
the things, we do not need to store a full trace of every change for all
eternity. Consider our binary builds. Every day, we make a number of new
builds of our products. These are checking into CVS, mostly to allow
easy distribution to the testers. However, we do not need to save the
history for the builds for eternity. Once in a while, a build is
promoted for release, and it will receive a tag. Other than that, we are
happy to only be able to go back, say 2 weeks, in the history, in
addition to the tagged versions. So, 2 weeks back, it would be fine to
discard the history, except for the tagged versions.

The reason for this is twofold:
- It obviously conserves disk space
- It might speed up the common use, since the history data structures do
not grow so much

> would this set of changes satisfy your needs?

Yes, for sure, and we can get by with less for starters. There are two
others things, as already mentioned:

- The CVS import needs to support binary files. The proposed strategy is
to simple try, and then fix what does not work.

- TortoiseCVS replacement. We would have to work on that.

I'm happy that we are converging on a way ahead, and it seems that the
required work is managable within a 6-developer month time frame.

The question of course is what we have overlooked...

Best regards,
Asger Ottar Alstrup





reply via email to

[Prev in Thread] Current Thread [Next in Thread]