gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] [BUG] FEATURE PLAN: two stage commit


From: Zenaan Harkness
Subject: Re: [Gnu-arch-users] [BUG] FEATURE PLAN: two stage commit
Date: Tue, 13 Jul 2004 09:29:29 +1000

On Tue, 2004-07-13 at 05:31, Tom Lord wrote:

>     > From: Zenaan Harkness <address@hidden>

>     >>> My current thoughts are just to modify commit & lock-revision to
>     >>> be able to leave a all-but-committed changeset in place, and to
>     >>> renamed-to-fully-committed from the command line. 
> 
>     >> That approach, if I understand what you mean, will fail to have
>     >> ACID properties and will therefore be a
>     >> _MAJOR_HUGE_VERY_VERY_BIG_AND_SERIOUS_ regression.
> 
>     >> Ok, the word "regression" is slightly misleading since it wouldn't
>     >> break existing functionality.   But it would add new functionality
>     >> that is incredibly broken in a way that existing functionality
>     >> carefully avoids.
> 
>     > The only problem with a separate locking mechanism is exactly that - it
>     > would be 'separate' or added functionality to the existing system.
> 
> If I understand the idea of just exposing the half-committed mechanism
> and letting people use that directly the problem is that the outcome
> of a composite transaction could wind up being that half of the
> transaction had effect and half was rolled back.  "Well behaved"
> scripts using the raw functionality could, of course, implement
> exactly the robust algorithm I've been talking about but they could
> also freely screw up.  Why not, then, build that algorithm in, prevent
> clients from screwing it up, and avoid adding a new user-visible
> revision state ("half committed") to user-visible archive semantics?
> (It's ok if the half-committed state is exposed in an advisory way,
> the same way that revision locks currently are;  it's just that 
> it shouldn't go deeper than that.)

So my immediate thought is - has anyone read the "alternative commercial
product"'s manual to see what it does?

OK, back to the start of the thread:

On Tue, 2004-06-08 at 06:22, Tom Lord wrote: 
> This will enable "distributed commit" -- a simultaneous, atomic commit
> to multiple branches (possibly in separate archives) at once.

I'll ignore the "separate archives" bit here - and I think we should
ignore it unless proven necessary: AIUI, a large point of the arch
model is that changes can be applied to alternative trees in different
sequences. That's how the whole thing becomes decentralized and
therefore scalable to 100s if not 1000s of repos. Give that up not.

So we need to commit to multiple branches atomicly.

Well, lock those branches.

And the simplest way to lock - serialize commits of all 'changes'
(including these new 'meta' changes), to server's archive.

This implies a meta-change is just another changeset: no point having
separate entities here; keep it uniform to the greatest extent possible.
As Hans Reiser would say, the less numerous the primitives, then
the fewer communication interfaces between primitives are needed
(which grow exponentially with the number of primitives otherwise),
and therefore the greater the flexibility of the system.

Then we have two clients, which, instead of submitting sequences
of changes A+B+C and B+D+E, simply submit two changes X and Y, which
happen to contain ABC and BDE respectively.

Whichever change, X or Y, the server first accepts and commits, succeeds.
The client submitting the second change is rejected.

These transactions each occur in how much time?

If the time for such a commit is longer than is scalable,
how else can this time be reduced, without introducing complicated
and error prone, and client-intelligence-requiring locking semantics?

If you really have hundreds of commits per minute or thousands per hour,
this is not a 'normal' repository right? And even such a "problem" can
be solved if the submissions can in any way be hierarchically partitioned
(non-interacting branches), where the cross-branch commits are rare - you
see, I'm having difficulty finding an actual example of the problem we
are trying to solve. We are not out to be a 1000s of TPM DB, right?

>From memory, the Linux-kernels official repository was running on a
lowly Pentium 200 or something for its first year or two. How big are we
trying to get?

And if performance is within this realm of reasonableness that I find it
hard to think outside of, branch-locks would be enough to do what we need
to do here. There are established Posix interfaces for creating
"lock files" - so just have such a standard zero-byte lock file for each
branch. Of course if you don't grab (create) the lock file, then the
naive implementation is to poll until it is available. Which is why you'd
probably want a 'proper' locking mechanism in the server (ie. in-memory
per-branch semaphor or whatever the appropriate lock term is).

cheers
zen




reply via email to

[Prev in Thread] Current Thread [Next in Thread]