chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-users] Re: compiling jbogenturfa'i .scm => .c takes 2.5 hours


From: Alan Post
Subject: [Chicken-users] Re: compiling jbogenturfa'i .scm => .c takes 2.5 hours
Date: Fri, 26 Nov 2010 15:16:54 -0700

On Fri, Nov 26, 2010 at 09:07:01AM -0700, Alan Post wrote:
> Let me describe the compiler using a toy example, so you can see the
> basic framework I'm trying to compile.  In Chicken Gazette #5, the
> Tony Garnock-Jones' packrat parser was introduced, and it contained
> roughly this grammar, though I've optimized it for features present
> in the PEG specification that are not present in the packrat egg:
> 
>   expr   <- mulexp #\+ mulexp
>           / mulexp
>   mulexp <- simple #\* simple
>           / simple
>   simple <- digits
>           / #\( expr #\)
>   digits <- [[:digit:]]+
> 
> You can see that expr uses mulexp, which uses simple, which uses
> digits and expr.  The point here is that there are both backward and
> forward references in the grammar.  expr indirectly uses simple,
> which directly uses expr.  One non-terminal may refer to another
> non-terminal yet to be defined, and rules may be mutually recursive.
> 
> genturfa'i generates the following parser for this grammar:
> 
>   (define gerna
>     (letrec ((expr (nunjavni-morji
>                      (nunjavni-jonai
>                        (nunjavni-je
>                          (nunjavni-naselci mulexp)
>                          (nunjavni-lerfu #\+)
>                          (nunjavni-naselci mulexp))
>                        (nunjavni-naselci mulexp))))
>              (mulexp
>                (nunjavni-morji
>                  (nunjavni-jonai
>                    (nunjavni-je
>                      (nunjavni-naselci simple)
>                      (nunjavni-lerfu #\*)
>                      (nunjavni-naselci simple))
>                    (nunjavni-naselci simple))))
>              (simple
>                (nunjavni-morji
>                  (nunjavni-jonai
>                    (nunjavni-naselci digits)
>                    (nunjavni-je
>                      (nunjavni-lerfu #\()
>                      (nunjavni-naselci expr)
>                      (nunjavni-lerfu #\))))))
>              (digits (nunjavni-morji (nunjavni-re "[[:digit:]]+"))))
>       expr))
> 

I'm going to try to summarize the IRC conversation on this subject
from earlier today.  I didn't quite follow everything, so please
speak up if I missed something essential.

Each of the operators in peg, like *, ?, +, etc, is translated into
a procedure prefixed with 'nunjavni-'.  These procedures are
function generators.  They take either a terminal to match: a
character, string, or regular expression; or the result of another
'nunjavni-' prefixed function.

Since each of these rules is a function generator, each record in
letrec* runs these generators in sequence, until each record consists
of the result from the outermost generator (in all cases in this code,
a memoization rule), itself tied to the next generated routine, on and
on until a non-terminal symbol or a terminal expression is reached.

Each record in letrec is then a custom-defined piece of code to
perform the specific sequence of PEG operators that that record
defines.

Loading my 368k jbogerna.scm file into csi takes a few seconds.
Something about this design is generating a lot of code/compile time
when I compile, however.

I'm giving to understand that each of these 'nunjavni-' prefixed
rules generating a closure creates a lot of entities that need to be
created and compiled.  I don't quite understand what exactly is the
problem with this, as that was least clear to me from the discussion
on IRC.

We did some experimentation with inlining, and it was suggested that
the fact that I'm generating all of these routines which themselves
call back into the genturfa'i module might be causing problems--that
if all the code was included in a single egg I would not be getting
such large C files generated during compilation. 

It also appears that on older machines, it takes substantially
longer than 3.5 minutes to translate the .peg grammar to a .scm file.
I received some advice on making the hot spots in my code faster,
the advice being very similar to the advice I got for improving utf8
performance.  I'm planning on experimenting with this.

I would love to better understand what it is I'm doing and why it is
causing a problem.  I'm not sure how to fix my code, as I don't
quite get why what I'm doing is triggering the compiler to generate
such large files.

Thank you again, everyone.  You've been really awesome.

-Alan
-- 
.i ko djuno fi le do sevzi



reply via email to

[Prev in Thread] Current Thread [Next in Thread]