--- Begin Message ---
Background info abot the unfyication tool
Tue, 18 May 2010 19:42:01 +0200
KMail/1.12.4 (Linux/188.8.131.52-0.2-desktop; KDE/4.3.5; x86_64; ; )
Want to share some more info about what the *** I am trying to do.
On the unification, unification match and backtracking.
first off, unification.
Let capitals, X,Y,Z be varible and x,y,z etc symbols. Let [X Y Z | W]
denotes lists all in the tradition of prolog. Let <u> be unification
and X = a, mean that X binds to a!!
X <u> 1, means X binds to 1
X <u> Y, means X and Y share storage and if later X <u> a means that both
X = a and Y = a.
And this naturally expands into list forms like
[X a] <u> [b Y] means succeed and Z = b and Y = a
note a <u> b will fail
So this is the basic idea of the vanilla unification.
But there is a nice litle glitch. What about
X <u> [a X], X would mean [a ...] here so it has a meaning
under the assumpton that we allow i8 sized structures :-). But to
be safe from infinite loops, one could just dissalow and say
that the above will fail. Hence the need of a loop check when
unify structures. Actually this is a feature that one could want
Destructuring in a match in a unifying flavor is in the simplest formulation
(umatch Q ([X a Y | L] := ... ))
This is like
(let ((X (var!))
(if (<u> [X A Y | L] Q)
(error "failed to match")))
But this introduce unessesary consings and can be done more smartly. This
the goaö of the latest mails I send to the list.
Backtracking. Backtracking is simply put
(let ((X (var!))
(let ((frame-nr (newframe)))
(if (<u> Q [X a Y | L])
(if (<u> Q [X Y])
so, variables are allocated on a stack and when new bindings
is done the old values is saved on an undo-array. At a newframe indices
into the undo list and the variable stack is saved.
The idea with this system is to be able to batch up changes and then undo
all the stuff that went into it during the batch at the backtrack.
The undo operation is faster then the creation operation if there are more
then a few changes but only the first change of a variable
in a frame is stored on the undo list in order to potentially save space
and a little speed. This feature is maybe not as usueful but keeping the
granularity of the allocated object and a SCM size means that there is room to
keep track the frame a variable rebinds at. There is some flags that describes
the type of an object. They are
1. EQ; EQUAL? - if it binds to data, prolog applications commoly use symbols
and integers. so a difference between EQ and EQUAL can be of
2. CONS - need to be able to tell if the object is a cons cell.
3. REF - need to quickly tell if the data is a reference to another
so basically to lookup a binding is close to
SCM * f(SCM * id)
id = (SCM *) *id;
This is a quite fast way of doing lookups of objects. The
speed should simple be bounded by transfering the id
locations from the cache. This way of doing lookups was believed
to be optimal. But one could speculate that it could be wise to
minimize the number of lookups. So the implementation now
walk the links. But keep pointers on a local array. Then
when the lookup has finished, all the pointers are directed
directly to the last lookup. E.g.
This means that we only creates shortcuts when we can amortize
that on the cost of dooing lookup which is as the scale grows,
the heaviest part becasue the extra operations of shortcuting
work with data very close in the cache hierarky.
Let's consider an example.
consider the problem of a directed graph and that we would like
to define a relation path-to, =>, e.g. (=> X Y) mean there is a
path from X to Y. Given the arrow relation ->, eg. (-> X Y) means
X points to Y.
So. the path to is a very simple prolog program, e.g.
i) (=> X Y) :- (-> X Y).
ii) (=> X Y) :- (-> X Z), (=> Z Y).
iii) (=> X Y) :- (=> X Z), (-> Z Y).
Note how an extra variable Z has shown up on the right hand side
1. (-> a b)
2. (-> b c)
3. (-> c a)
4. (-> c d)
5. (-> d e)
and ask if we have
(=> a e).
then i) fails
ii) succeeds first with (-> a b) then (=> b e) is asked for
ii) succeeds first with (-> b c) then (=> c e) is asked for
ii succeds first with (-> c a) then (=> a e) is asked for
and so on ad infinitum.
The thing is that this scheme works if we assume that the graph is acyclic.
The cool thing though is that we can fix this,
basically we mark all arrows (-> a b) that has been visited via a special
hashmap kind of datastructure and then we add something like,
ii) (=> X Y) :- (if (not-before? X Y)) ,
(-> X Z) ,
(mark X Z) ,
(=> Z Y) .
iii) (=> X Y) :- (if (not-before? Z Y)) ,
(-> Z Y) ,
(mark Z Y) ,
(=> X Z) .
and now we know how to easilly find all paths form x to y in a general
directed graph that passes atmosst 3 times across an arrow, right!
(well the graph cannot be too big :-)
using a matcher we could do this something along
(defmacro next-if-fail (P) `(let ((Ret ,P)) (if Ret Ret (next))))
(defmacro fail-if-fail (P) `(let ((Ret ,P)) (if Ret Ret #f )))
(a b CC (next-if-fail (CC)))
(b c CC (next-if-fail (CC)))
(c a CC (next-if-fail (CC)))
(c d CC (next-if-fail (CC)))
(d e CC (next-if-fail (CC)))
(_ _ CC #f))
(X Y CC (next-if-fail (-> X Y CC)))
(X Y CC (let ((F (newframe))
(-> X Z
(if (not-before? X Z)
(mark X Z)
(fail-if-fail (=> Z Y CC)))))))))
(X Y CC (let ((Z (var!)))
(-> Z Y
(if (not-before? Z Y)
(mark Z Y)
(fail-if-fail (=> Z Y CC))))))))))
(define (not-before? X Y)
(letrec ((f (ulambda ((X Y (X Y . L)) #t)
((X Y (_ . L)) #t)
(f X Y *befores*)))
(define (mark X Y) (ucons X (ucons Y *befores*)))
Notice how we can write a macro :- so that we can skip all the
lambdas and be more close to the prolog definitions. Also, to
scale better one could implement a real hashmap that behaves well
under the undo scheme.
This is all for now.
--- End Message ---