gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] [BUG] FEATURE PLAN: two stage commit


From: Zenaan Harkness
Subject: Re: [Gnu-arch-users] [BUG] FEATURE PLAN: two stage commit
Date: Mon, 12 Jul 2004 21:14:44 +1000

On Mon, 2004-07-12 at 15:35, Tom Lord wrote:
...
> After those two phases we can clean up a bit.   We can note that B, C,
> D, ... are no longer "half committed" or "conditionally committed".
> We can mark them as fully committed so that reader don't have to
> verify that A has been committed.   But this is just paperwork... it's
> the two phases described above that really matter.
>
> Note that the clean-up step is just an optimization.   A reader who
> encounters a half-committed B can't decide if that's a completed
> commit or not without observing whether or not the corresponding
> commit of A is complete.

This is fine and satisfies the 'transactional' atomicity requirement.
Ideally however, no clients have to be involved in "checking whether a
half-commit is actually a full commit". Ie., clients of the database
(tla) should only see either:

* archive at ZERO, or

* archive at ZERO + A + B + ...

and nothing inbetween. This might be obvious or meant to be anyway, but
I thought I'd clarify the point.

> While B is "half committed" but before A is committed, B functions
> like a lock.   Nobody else should commit ahead of B.   The commit of B
> has to be canceled or completed before someone else can commit to that
> same version.   So between phase 1 and phase 2, the "half commit" of B
> is basically an ordinary version lock.

I see - you are using the changes as locks.

A change is a change.

A lock is a different concept.

Transactions require locks.

Overloading a change to be a lock might be a convenience of
implementation, performance, or somethe other reason, but is not the
'ideal interface'.

> Let's say we want to break that lock.   We can't do that just locally
> to B.   Some other client may be convinced that B is half committed
> and, on the basis of that assumption, commit A.   If A is committed
> but the B commit broken, then we've had an atomic commit of only A, C,
> D, E, .... leaving out B: a bug.

Lock detection between clients is another reason to hide in-progress
transactions from clients and minimize the need for intelligent clients
(forget for a moment that each client is a server for its own repository
- that's not the point here).

> In terms of traces, a client who wants to atomically commit A, B, C,
> ... will do:
> 
>       half-commit B
>       half-commit C
>       half-commit D
>       half-commit E
>         ....
>         commit A              [*]
...

What I think you want to do to simplify this is have a server solely
responsible for commits to its own repository (this might be implicit
already), and have clients take locks within the borders of a
transaction.

This is another of the ACID properties - isolation. We want to isolate
transactions, for which we need locks to provide the needed isolation
between clients (please forgive me if I've stuffed up the terminology -
I think it's right but I'm no DB guru - yet :):

Two clients can commit sets of changes that should be mutually exclusive
and each "either write or not write" each of its "transactions" changes.

What we want is for each client to isolate its transaction against each
other transaction.

So what we want is to define the locks needed:

 - change locks
 - branch/ tree locks
 - whole repository lock

??? (Just guessing here..)

There are rules on locks - see C J Date's classic for the details (I
have it in a box somewhere - still unpacking at this point sorry), such
as ways to take locks (always take highest level lock that you will or
might need first, and finer grained locks subsequently, lock minimal
required (obvious), take minimal lock needed (eg. each of above category
locks can be READ lock or WRITE lock (and I think there might even be
another type of lock).

If you have clear locking semantics (how to take, release and detect a
lock) there are standard idioms/ algorithms that can then be used for
detecting deadlocks, unrolling just the right clients in a deadlock so
that the 'ideal' one can progress, etc (sorry for starting to get
ephemeral here ... I know these things exist and it's the best I can
explain it).

>     > On a related note, I am planning on implementing two-phase
commits with
>     > non-tla resources. 
> 
> In english please?  "non-tla resources?"
> 
>     > Would this fit well within your architecture above?
> 
> No idea since the question is hopelessly vague.
> 
>     > My current thoughts are just to modify commit & lock-revision to
be able
>     > to leave a all-but-committed changeset in place, and to
>     > renamed-to-fully-committed from the command line. 
> 
> That approach, if I understand what you mean, will fail to have ACID
> properties and will therefore be a
> _MAJOR_HUGE_VERY_VERY_BIG_AND_SERIOUS_ regression.
> 
> Ok, the word "regression" is slightly misleading since it wouldn't
> break existing functionality.   But it would add new functionality
> that is incredibly broken in a way that existing functionality
> carefully avoids.

The only problem with a separate locking mechanism is exactly that - it
would be 'separate' or added functionality to the existing system.

Anyway, hopefully the above will contribute in some way.

cheers
zen




reply via email to

[Prev in Thread] Current Thread [Next in Thread]