Background info abot the unfyication tool

guile-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Background info abot the unfyication tool

From:	stefan
Subject:	Background info abot the unfyication tool
Date:	Tue, 18 May 2010 19:42:01 +0200
User-agent:	KMail/1.12.4 (Linux/2.6.31.12-0.2-desktop; KDE/4.3.5; x86_64; ; )

Hi,

Want to share some more info about what the *** I am trying to do.

(Part I)

On the unification, unification match and backtracking.

first off, unification. 

Let capitals, X,Y,Z be varible and x,y,z etc symbols. Let [X Y Z  | W] 
denotes lists all in the tradition of prolog. Let <u> be unification
and X = a, mean that X binds to a!!


X <u> 1, means X binds to 1
X <u> Y, means X and Y share storage and if later  X <u> a means that both 
X = a and Y = a.

And this naturally expands into list forms like

[X a] <u> [b Y]  means succeed and Z = b and Y = a

note a <u> b  will fail

So this is the basic idea of the vanilla unification.

But there is a nice litle glitch. What about
X <u> [a X],  X would mean [a ...] here so it has a meaning 
under the assumpton that we allow i8 sized structures :-). But to 
be safe from infinite loops, one could just dissalow and say 
that the above will fail. Hence the need of a loop check when
unify structures. Actually this is a feature that one could want
to relax.

Destructuring in a match in a unifying flavor is in the simplest formulation

consider 
(umatch Q ([X a Y | L] := ... ))

This is like
(let ((X (var!))
      (Y (var!))
      (L (var!)))
  (if (<u> [X A Y | L] Q)
      ...
      (error "failed to match")))

But this introduce unessesary consings and can be done more smartly. This
the goaö of the latest mails I send to the list.

Backtracking. Backtracking is simply put

(let ((X (var!))
      (Y (var!))
      (L (var!)))
  (let ((frame-nr (newframe)))    
    (if (<u> Q [X a Y | L])
      ...
      (begin
        (unwind-to frame-nr)
         (if (<u> Q [X Y])
             ...
             (begin
               (unwind frame-nr)
               ...))))))

so, variables are allocated on a stack and when new bindings
is done the old values is saved on an undo-array. At a newframe indices
into the undo list and the variable stack is saved. 

The idea with this system is to be able to batch up changes and then undo
all the stuff that went into it during the batch at the backtrack. 
The undo operation is faster then the creation operation if there are more 
then a few changes but only the first change of a variable
in a frame is stored on the undo list in order to potentially save space
and a little speed. This feature is maybe not as usueful but keeping the 
granularity of the allocated object and a SCM size means that there is room to 
keep track the frame a variable rebinds at. There is some flags that describes
the type of an object. They are

1. EQ; EQUAL?   - if it binds to data, prolog applications commoly use symbols
                  and integers. so a difference between EQ and EQUAL can be of 
                  use

2. CONS         - need to be able to tell if the object is a cons cell.
3. REF          - need to quickly tell if the data is a reference to another 
                  object!!

so basically to lookup a binding is close to
SCM * f(SCM * id)
{
retry:
  if(FLAG_IS_POINTER(*id))
    { 
      id = (SCM *) *id; 
      goto retry;
    }
  return id;
}

This is a quite fast way of doing lookups of objects. The
speed should simple be bounded by transfering the id 
locations from the cache. This way of doing lookups was believed
to be optimal. But one could speculate that it could be wise to 
minimize the number of lookups. So the implementation now
walk the links. But keep pointers on a local array. Then
when the lookup has finished, all the pointers are directed
directly to the last lookup. E.g.

Before lookup:
x->y->z->w->*

After lookup:
x->w->*
y->w->*
z->w->*

returns w;

This means that we only creates shortcuts when we can amortize 
that on the cost of dooing lookup which is as the scale grows, 
the heaviest part becasue the extra operations of shortcuting
work with data very close in the cache hierarky.


Let's consider an example.

consider the problem of a directed graph and that we would like
to define a relation path-to, =>, e.g. (=> X Y) mean there is a 
path from X to Y. Given the arrow relation ->, eg. (-> X Y) means
X points to Y.

So. the path to is a very simple prolog program, e.g.

  i)  (=> X Y) :- (-> X Y).
 ii)  (=> X Y) :- (-> X Z), (=> Z Y).
iii)  (=> X Y) :- (=> X Z), (-> Z Y).

Note how an extra variable Z has shown up on the right hand side

now consider
1. (-> a b)
2. (-> b c)
3. (-> c a)
4. (-> c d)
5. (-> d e)

and ask if we have
(=> a e).
then i) fails
    ii) succeeds first with (-> a b) then (=> b e) is asked for
     i) fails
    ii) succeeds first with (-> b c) then (=> c e) is asked for
     i) fails
    ii  succeds  first with (-> c a) then (=> a e) is asked for

and so on ad infinitum.

The thing is that this scheme works if we assume that the graph is acyclic. 

The cool thing though is that we can fix this, 
basically we mark all arrows (-> a b) that has been visited via a special
hashmap kind of datastructure and then we add something like,

 ii)  (=> X Y) :- (if (not-before? X Y)) ,
                  (-> X Z)               , 
                  (mark X Z)             ,
                  (=> Z Y)               .  
iii)  (=> X Y) :- (if (not-before? Z Y)) ,
                  (-> Z Y)               ,
                  (mark Z Y)             , 
                  (=> X Z)               . 

and now we know how to easilly find all paths form x to y in a general
directed graph that passes atmosst 3 times across an arrow, right!
(well the graph cannot be too big :-)

using a matcher we could do this something along

(defmacro next-if-fail (P) `(let ((Ret ,P)) (if Ret Ret (next))))
(defmacro fail-if-fail (P) `(let ((Ret ,P)) (if Ret Ret  #f   )))
(udef -> 
     (a b CC   (next-if-fail (CC)))
     (b c CC   (next-if-fail (CC)))
     (c a CC   (next-if-fail (CC)))
     (c d CC   (next-if-fail (CC)))
     (d e CC   (next-if-fail (CC)))
     (_ _ CC   #f))

(udef =>
     (X Y CC   (next-if-fail (-> X Y CC)))
     (X Y CC   (let ((F (newframe))
                     (Z (var!)))               
               (next-if-fail 
                (-> X Z
                (lambda ()
                  (if (not-before? X Z)
                      (begin
                        (mark X Z)
                        (fail-if-fail (=> Z Y CC)))))))))
     (X Y CC   (let ((Z (var!)))
                 (next-if-fail 
                  (-> Z Y
                  (lambda ()
                    (if (not-before? Z Y)
                        (newframe 
                           (mark Z Y)
                           (fail-if-fail (=> Z Y CC))))))))))

                   
(define (not-before? X Y)
  (letrec ((f (ulambda ((X Y (X Y . L))  #t)
                       ((X Y (_   . L))  #t)
                       (()               #f))))
    (f X Y *befores*)))

(define (mark X Y) (ucons X (ucons Y *befores*)))


Notice how we can write a macro :- so that we can skip all the
lambdas and be more close to the prolog definitions. Also, to
scale better one could implement a real hashmap that behaves well
under the undo scheme.

This is all for now.

Stefan

[Prev in Thread]

Current Thread

[Next in Thread]

Background info abot the unfyication tool, stefan <=
- Re: Background info abot the unfyication tool, Andy Wingo, 2010/05/21
- Re: Background info abot the unfyication tool, stefan, 2010/05/22
- Re: Background info abot the unfyication tool, stefan, 2010/05/22

Prev by Date: Re: fmatch
Next by Date: Re: Merging R6RS libraries?
Previous by thread: Re: %module-public-interface
Next by thread: Re: Background info abot the unfyication tool
Index(es):
- Date
- Thread