axiom-developer
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Axiom-developer] Algebra bootstrap


From: Gabriel Dos Reis
Subject: Re: [Axiom-developer] Algebra bootstrap
Date: 06 Feb 2007 23:00:27 -0600

Waldek Hebisch <address@hidden> writes:

| Gabriel Dos Reis wrote:
| > Waldek Hebisch <address@hidden> writes:
| > 
| > | Current compiler (before algebra bootstrap) does not know about Integer.
| > 
| > I don't think so.  
| > 
| > The internal AST that eventually reaches the type analyzer and the
| > semantic analyzer, starting from compTopLelve, already contain mention
| > of Integer (and the other basic types).
| >
| 
| Try starting with minimal databases.  You will get messages as:
| 
| Integer is not a known type
| 
| or
| 
| Boolean is not a known type

Yes, that adds evidence to my claim that the compiler knows about
Integer and other basic types: they are gotten through the databases,
and the compiler (in its current state) cannot do anything meaningful
without that knowledge.  That is what I meant.

| Compiler do have _some_ knowledge: Record, Union, Mapping and
| Enumeration are built-in.

Yes. Record, Union, and MApping are necessary.  

For Enumerations, I have not make my mind yet.  Som built-in support
is needed, but it is not clear to me that level of built-in is needed.

|  There are some more or less random
| pieces of information hardcoded in various parts of the compiler.

Yes, a systematic approach for restructuring the compiler (as well as
the interpreter) is needed.  No doubt.  My primary concern is about
laying down precisely the rules.  That needs some kind of basic "first
principles".  That is what I'm aiming at.  I largely regard the
current setting as a defective departure that needs correction.  The
question is what to correct?  And in which directions?  

| But Integer, Boolean and List are recognized only if present in
| databeses.

and if they are not, the compilation is screwed!  You cannot really
define them as "secondary", because they come as annotation for
constants,a nd they parser knows they must mean something.  They
really are "primary" from the compiler point of view.  The compiler
needs to query the database because, it believes that Integer has more
properties (e.g. Euclidean domain) than it assumes.  That is not the
case for a Record for example, neither for Mapping.  The case of List
is a bit curious.  However, my *guessing* from reading the code for
List is that it was realized that List can be assumed a basic domain,
but later enriched in an algebra file -- embryonic ideas for "post
facto extensions". 

| Note that already postTransform uses semantic information:
| 
| postAtom x ==
|   $BOOT => x
|   x=0 => '(Zero)
|   x=1 => '(One)
|   EQ(x,'T) => 'T_$ -- rename T in spad code to T$
|   IDENTP x and GETDATABASE(x,'NILADIC) => LIST x
|   x
|                ^^^^^^^^^^^^^^^^^^^^^^^

But that really is not semantic information -- you don't need to
evaluate "x" to know what it means.  That information is
purely syntactic, because you can infer it from purely syntactic
considerations looking at the parse form of a definition.

Now, if your point is that the post-parser transformer should be fixed
to clearly delimit the phase, the answer is yes.  As the key is
approach this from the angle of "what it ideally should do?"; then you
look at the current approximation and see where the defect is.

| and parseTransform (more precisely parseHas) again preforms database
| lookup. 

Much less than the interpreter.  Still before and during the type
checking phase, we don't need to consult the databas, except, except,
when we have an "import" statement.  But, even there I'm not sure
looking at the database is the right answer.

|  
| > | And Integer depends on large part of algebra.  So I would say that such
| > | basic compiler while a noble goal would require substantial algebra
| > | rewrite.  Such rewrite in turn need something like Aldor post facto
| > | extension.
| > 
| > By "simple type", I don't expect the basic type to understand beyond
| > simple operations defined on their domains.  
| > 
| > The glorious version of Integer is not needed wholesale when starting
| > the boostrap process.  The glorification is a set that needs to come
| > after.  Yes, post facto extensions are ways to achieve late
| > glorification -- and if you look at the practice, this is not fancy;
| > just think "type classes" (Haskell), which Axiom would have invented
| > two decades ago if it had post facto extensions.
| > 
| 
| I use the following Integer for bootstrap:
| 
| )abbrev category INS IntegerNumberSystem
| IntegerNumberSystem(): Category == with nil
| )abbrev domain INT Integer
| Integer: IntegerNumberSystem with
|     FakeIdentityToAppeaseSpad: % -> %
|   == add
|     FakeIdentityToAppeaseSpad(x) == x
| 
| This is enough to convince compiler that Integer is a valid type.  I do
| not see how better Integer will help in bootstrap.  More precisely to
| bootstrap you need to resolve forward references.  And the problem is
| that in _last_ bootstrap pass there is a lot of forward references.
| I do not think that will help resolve those references.

No, but you're looking at a much higher level and you don't see the
trees. 

The question, for each individual uses of Integer is what is needed?
If you use "+", then the above fake is insufficient.
Where the "post factor" extensions helps is that you can add those
required operations as you go.  This process is very diffrent from
asserting that Integer belongs to a given caterogy.

| I guess that you think about something like classic compiler bootstrap,
| where you need to get translator for small language running and than
| you can use it to translate rest of the compiler.

Yes, but there is nothing un-classic here, except that the runtime
support is a Lisp system.  And precisely, we should take advantage of
that. You extend the small think and load it, and have an improved
version.  That is much simpler that the "classic" compiler bootstrap.
Remember, they bootstrapped Lisp before they bootstrapped C, so if
sometrhing is classic, it certainly is Lsip bootstrapping :-)

|  In particular such
| set of basic domains would allow to write large part of compiler in
| Spad. 

YES!

I believe, I already stated that many times in the past.
I evene recalled Bill saying that from his point of view Boot was
conceived as a mini-SPAD, but I would have to disagree with him on
several technical points :-)


| But current bootstrap problem is really all about databases,

Get rid of the databases, or said differently look at the database as
the final thing with the belts and whistles.  Don't design the
bootstrap process around it.  It is what you build as wrap up.  Think
of it as the .so or .a you ship to user.  You don't build your
compiler to depend on it for its own bootstrap.  You design your
compiler so that it can produce the library and use it.  Shift
perspective. 

| we do not care if during database bootstrap we get correct code (or
| any code at all),

We should care that we get correct code.  Compiler miscompilation is a
nightmarre to debug.

| we just want to collect type declaration.  

And most of this can be done at the abstract syntax level -- given the
relatively simple syntax.

| And
| we need a lot of declarations in databases to start compiling
| current Axiom algebra.

If we get to the point that the compiler can handle a unit
(e.g. source file) at a time, then you can rely on mutually recursive
definition to sort out the initial problem.

[...]

| > | > We must also be able to process a file without first having to split
| > | > it into several chunks.  We have been able to process SPAD files that
| > | > way; making )abbrev a no-op.
| > | > 
| > | 
| > | In longer run sure.
| > 
| > Well, Axiom has 25-year horizon; so not sure what you mean by 
| > "longer run" :-)
| > 
| 
| Few months and longer.

OK, that is nothing then :-)

| > | ATM I am using quite different method: I dumped
| > | parse trees of the whole Axiom algebra into a file.  So now I can load
| > | the whole algebra just as a single S-expression. 
| > 
| > Well, (almost) everything is an S-expression; so I must ask which
| > "form" do you save an load?  The parse tree?  the parse form?  The
| > internal AST? or the generated Lisp code?
| > 
| 
| I wrote: parse tree.  More precisely I took output of Meta parser, just
| before call to postTransform.  I replaced call to postTransform by
| call to a simple flattener (to replace nested trees having semicolon
| as operator by lists) and pretty-printed the result of flattener 
| (without flattening the expression had too much nesting and
| pretty-printer could not print it correctly). 

OK.  I'm nervous about having to duplicate the work of the post-parser
transformer.  It should be fixed so that you don't have to outsource
its job.

[...]

| My Lisp code is throw-away code (or research if you prefer such name).
| The basic thing is that I am getting better turnaroud with sbcl.  
| I can load the algebra tree into sbcl in about half of second (I am
| loading it from a fasl).  Tree walk trough this tree takes small part
| of second.  If I have large output most of the time is spent pretty-printing
| the results, which may take few seconds.  If results are small the
| whole run may finish in a second.  I do not see how I could get
| such turnaroud working inside Axiom tree.  FYI, Axiom needed about
| 70 seconds to parse algebra -- I certainly do not want to loose
| 70 seconds each time I try something.

Now, the question is whether it is an inherent inefficient in Axiom,
or its runtime system (GCL).  I know that some long time GCL users
have complained about GCL spending too much time doing unnecessary
work (mostly done on its behalf by GCC).

If it is an inherent deficiency in Axiom, we must know.

For example, there are many places in where an a-list is used, and
that a-list may induce in some places linear search -- when a
hash-table or just a good tree would have been better.  The 
the Environment (third slot of the triplet) is a horrible mess of
a-lists. 

| > [...]
| > 
| > | Concerning abbrev declaration: I am not sure how to find out wether
| > | something is a package or a domain without using information from
| > | abbrevs.
| > 
| > We should get to a poing where we don't need to distinsguish that!
| > 
| > Now, if you look at a capsule, it tells your whether something "is"
| > for a domain or not by the presence or absence of 'domain in the
| > category declaration.  Again look at the the internal AST produced by 
| > parseTransform.
| > 
| 
| I do not want to use output of parseTransform because it depends on
| databases. 

yes but the uses of the databases at that particular point are not
necessary. If you fix that, do you still need to duplicate the job?

| If databases miss some information the resulting tree
| is incorrect (sometimes the compiler will later fix such a tree, but
| many cases lead to errors).  

IF you can process a file in a whole -- which you can with the
skeleton I posted earlier -- you don't need to consult the databases.

| 
| I think that currect Axiom apprach to compilation has serious flaw.
| Namely postTransform and parseTransform are dane _before_ macro
| expansion.  This means that they have only limited syntactic
| information and some transformations just work blindly.  Both
| parseTransform and postTransform compensate this lack of syntactic
| data by use of semantic informations from databases.
| 
| I think that macro expansion should be done just after initial parse
| (before any semantic transformations), that is why I have written
| my macro expander.  BTW: current Axiom way allow pretty weird 
| macro tricks, but it seems that algebra uses only small, well
| behaved subset.

I believe I don't understand why the macro expansion *needs* to be
done before postTransform and parseTransform.  I'm not saying you're
wrong; I'm saying "can you elaborate?".

-- Gaby




reply via email to

[Prev in Thread] Current Thread [Next in Thread]