Re: Background info abot the unfyication tool

guile-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Background info abot the unfyication tool

From:	stefan
Subject:	Re: Background info abot the unfyication tool
Date:	Sat, 22 May 2010 23:24:06 +0200
User-agent:	KMail/1.12.4 (Linux/2.6.31.12-0.2-desktop; KDE/4.3.5; x86_64; ; )

Hi,

Folks, you asked about using prompts to use delimited continuation aka 
prompts for the unification stuff.

Cool tech.

Anyhow. The main use for this thing in a unifying context is to be able to
cut a whole tree from further out in the limbs of the tree. 

The first step taken was to reuse the already made code 
meaning that it's a test and a minimal version but streamlined to the 
need of backtracking. I did not see any speed degragation or significant 
speedup. My einstein example gives 35ms compared to gprologs 16-20ms to get 
a rough idea of the speed. This example uses the hand made prompts for 
backtracking. To get the idea here is how it looks.

(define (gp-member X W Pr F)    
  (unify-match W (((,X . L) (F (gp-prompt)))
                  (( _ . L) (gp-member X L Pr F))
                  ( _       (gp-unprompt Pr)))))

Under the hood this expands to the logic
(define (gp-member X W Pr F)    
  (let ((sp      (get-sp))
        (next-ip (get-next L2))
        ... some extra stuff ...)
    (row-1 (,X . L)  (F (gp-prompt sp next-ip)))
    (label :L2)
    (row-2 (_  . L)  (gp-member X L Pr F))
    (label :L3)
    (row-3 _         (gp-unprompt Pr))))

Note take care with tail calls here.

So I bothered making shore that the prompt send to F restarts at
an stack-pointer sp, e.g. the sp at the beginning of the matcher, 
and instruction pointer starting at row-2, then set the internal 
state e.g. next-ip th :L3 and issue the matcher. 
So it's targeted to be economic in the context of backtracking and data 
already there for the umatch. The match context variables are aligned 
and there is a associated match frame pointer mfr that is used to track 
where we go next of the matches are nested inside the code. (fr and mfr) 
is what is stored in the prompt and used when it backtracks and recalculates
the state.

So that is actually what I have tested and seam to work for the test case.

The next step is to use guile prompts and to see if there is any performance
hits!! I actually don't expect that but we will see...

/Stefan

--- Begin Message --- Subject: Re: Background info abot the unfyication tool Date: Sat, 22 May 2010 22:42:09 +0200 User-agent: KMail/1.12.4 (Linux/2.6.31.12-0.2-desktop; KDE/4.3.5; x86_64; ; )

--- Begin Message --- Subject: Background info abot the unfyication tool Date: Tue, 18 May 2010 19:42:01 +0200 User-agent: KMail/1.12.4 (Linux/2.6.31.12-0.2-desktop; KDE/4.3.5; x86_64; ; )

Hi,

Want to share some more info about what the *** I am trying to do.

(Part I)

On the unification, unification match and backtracking.

first off, unification. 

Let capitals, X,Y,Z be varible and x,y,z etc symbols. Let [X Y Z  | W] 
denotes lists all in the tradition of prolog. Let <u> be unification
and X = a, mean that X binds to a!!


X <u> 1, means X binds to 1
X <u> Y, means X and Y share storage and if later  X <u> a means that both 
X = a and Y = a.

And this naturally expands into list forms like

[X a] <u> [b Y]  means succeed and Z = b and Y = a

note a <u> b  will fail

So this is the basic idea of the vanilla unification.

But there is a nice litle glitch. What about
X <u> [a X],  X would mean [a ...] here so it has a meaning 
under the assumpton that we allow i8 sized structures :-). But to 
be safe from infinite loops, one could just dissalow and say 
that the above will fail. Hence the need of a loop check when
unify structures. Actually this is a feature that one could want
to relax.

Destructuring in a match in a unifying flavor is in the simplest formulation

consider 
(umatch Q ([X a Y | L] := ... ))

This is like
(let ((X (var!))
      (Y (var!))
      (L (var!)))
  (if (<u> [X A Y | L] Q)
      ...
      (error "failed to match")))

But this introduce unessesary consings and can be done more smartly. This
the goaö of the latest mails I send to the list.

Backtracking. Backtracking is simply put

(let ((X (var!))
      (Y (var!))
      (L (var!)))
  (let ((frame-nr (newframe)))    
    (if (<u> Q [X a Y | L])
      ...
      (begin
        (unwind-to frame-nr)
         (if (<u> Q [X Y])
             ...
             (begin
               (unwind frame-nr)
               ...))))))

so, variables are allocated on a stack and when new bindings
is done the old values is saved on an undo-array. At a newframe indices
into the undo list and the variable stack is saved. 

The idea with this system is to be able to batch up changes and then undo
all the stuff that went into it during the batch at the backtrack. 
The undo operation is faster then the creation operation if there are more 
then a few changes but only the first change of a variable
in a frame is stored on the undo list in order to potentially save space
and a little speed. This feature is maybe not as usueful but keeping the 
granularity of the allocated object and a SCM size means that there is room to 
keep track the frame a variable rebinds at. There is some flags that describes
the type of an object. They are

1. EQ; EQUAL?   - if it binds to data, prolog applications commoly use symbols
                  and integers. so a difference between EQ and EQUAL can be of 
                  use

2. CONS         - need to be able to tell if the object is a cons cell.
3. REF          - need to quickly tell if the data is a reference to another 
                  object!!

so basically to lookup a binding is close to
SCM * f(SCM * id)
{
retry:
  if(FLAG_IS_POINTER(*id))
    { 
      id = (SCM *) *id; 
      goto retry;
    }
  return id;
}

This is a quite fast way of doing lookups of objects. The
speed should simple be bounded by transfering the id 
locations from the cache. This way of doing lookups was believed
to be optimal. But one could speculate that it could be wise to 
minimize the number of lookups. So the implementation now
walk the links. But keep pointers on a local array. Then
when the lookup has finished, all the pointers are directed
directly to the last lookup. E.g.

Before lookup:
x->y->z->w->*

After lookup:
x->w->*
y->w->*
z->w->*

returns w;

This means that we only creates shortcuts when we can amortize 
that on the cost of dooing lookup which is as the scale grows, 
the heaviest part becasue the extra operations of shortcuting
work with data very close in the cache hierarky.


Let's consider an example.

consider the problem of a directed graph and that we would like
to define a relation path-to, =>, e.g. (=> X Y) mean there is a 
path from X to Y. Given the arrow relation ->, eg. (-> X Y) means
X points to Y.

So. the path to is a very simple prolog program, e.g.

  i)  (=> X Y) :- (-> X Y).
 ii)  (=> X Y) :- (-> X Z), (=> Z Y).
iii)  (=> X Y) :- (=> X Z), (-> Z Y).

Note how an extra variable Z has shown up on the right hand side

now consider
1. (-> a b)
2. (-> b c)
3. (-> c a)
4. (-> c d)
5. (-> d e)

and ask if we have
(=> a e).
then i) fails
    ii) succeeds first with (-> a b) then (=> b e) is asked for
     i) fails
    ii) succeeds first with (-> b c) then (=> c e) is asked for
     i) fails
    ii  succeds  first with (-> c a) then (=> a e) is asked for

and so on ad infinitum.

The thing is that this scheme works if we assume that the graph is acyclic. 

The cool thing though is that we can fix this, 
basically we mark all arrows (-> a b) that has been visited via a special
hashmap kind of datastructure and then we add something like,

 ii)  (=> X Y) :- (if (not-before? X Y)) ,
                  (-> X Z)               , 
                  (mark X Z)             ,
                  (=> Z Y)               .  
iii)  (=> X Y) :- (if (not-before? Z Y)) ,
                  (-> Z Y)               ,
                  (mark Z Y)             , 
                  (=> X Z)               . 

and now we know how to easilly find all paths form x to y in a general
directed graph that passes atmosst 3 times across an arrow, right!
(well the graph cannot be too big :-)

using a matcher we could do this something along

(defmacro next-if-fail (P) `(let ((Ret ,P)) (if Ret Ret (next))))
(defmacro fail-if-fail (P) `(let ((Ret ,P)) (if Ret Ret  #f   )))
(udef -> 
     (a b CC   (next-if-fail (CC)))
     (b c CC   (next-if-fail (CC)))
     (c a CC   (next-if-fail (CC)))
     (c d CC   (next-if-fail (CC)))
     (d e CC   (next-if-fail (CC)))
     (_ _ CC   #f))

(udef =>
     (X Y CC   (next-if-fail (-> X Y CC)))
     (X Y CC   (let ((F (newframe))
                     (Z (var!)))               
               (next-if-fail 
                (-> X Z
                (lambda ()
                  (if (not-before? X Z)
                      (begin
                        (mark X Z)
                        (fail-if-fail (=> Z Y CC)))))))))
     (X Y CC   (let ((Z (var!)))
                 (next-if-fail 
                  (-> Z Y
                  (lambda ()
                    (if (not-before? Z Y)
                        (newframe 
                           (mark Z Y)
                           (fail-if-fail (=> Z Y CC))))))))))

                   
(define (not-before? X Y)
  (letrec ((f (ulambda ((X Y (X Y . L))  #t)
                       ((X Y (_   . L))  #t)
                       (()               #f))))
    (f X Y *befores*)))

(define (mark X Y) (ucons X (ucons Y *befores*)))


Notice how we can write a macro :- so that we can skip all the
lambdas and be more close to the prolog definitions. Also, to
scale better one could implement a real hashmap that behaves well
under the undo scheme.

This is all for now.

Stefan

--- End Message ---

[Prev in Thread]

Current Thread

[Next in Thread]

Background info abot the unfyication tool, stefan, 2010/05/18
- Re: Background info abot the unfyication tool, Andy Wingo, 2010/05/21
- Re: Background info abot the unfyication tool, stefan, 2010/05/22
- Re: Background info abot the unfyication tool, stefan <=

Prev by Date: Re: [Guile-commits] GNU Guile branch, master, updated. release_1-9-10-135-g7119448
Next by Date: Re: fmatch
Previous by thread: Re: Background info abot the unfyication tool
Next by thread: Re: Merging R6RS libraries?
Index(es):
- Date
- Thread