[Gnu-arch-users] Re: new language, arch, furth, etc.

gnu-arch-users
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gnu-arch-users] Re: new language, arch, furth, etc.

From:	Tom Lord
Subject:	[Gnu-arch-users] Re: new language, arch, furth, etc.
Date:	Tue, 20 Jul 2004 10:10:59 -0700 (PDT)

(Language design for version variables, continued)


* Reflection

  The generic types chosen for the new langauge are:

        symbols
        strings
        numbers
        sequences
        associations

  For the most part this is an arbitrary choice, made in emulation
  of other successsful systems.

  In one way, this is _not_ an arbitrary choice:  we have just
  enough, with those types, to represent programs in our language
  as data.

  Here is a program that contains the definition of another program:


        (define other-program '((define x 42)
                                (define y "hello")))

  We have no immediate use for that capability but it's a good sign.


* API, Implementation, and Semantic Subtleties

  Appended to this message is the "grammar so far" for the language.
  It has a few informal definitions such as:

        <number-exp> := ``a number in "the usual" notation''

  but, presuming that those are filled in, that's enough 
  for anyone to write a parser for this language.

  What about an "interpreter", though?  We need some way for an 
  application program to read one of these programs and then
  answer questions about the values bound to names.

  In the course of planning such an interpreter, some "semantic edge
  cases" come to light and the language definition has to be expanded
  to cover these:

z
** Multiple Definitions

  What (if anything) should this program mean?:

        (define x "hello")
        (define x "goodbye")

  What is the resulting value of `x', if any?

  The choice here is _somewhat_ arbitrary.  But I couldn't think of
  any clearly right interpretation to give a program with multiple
  definitions for a single name.

  Rather than pick an interpretation, I just chose to say that
  such programs are invalid and that implementations must reject
  them.   That way, if we later want to give some interpretation
  to multiple definitions, we can make that an upward-compatible 
  change to the language.


** Missing Definitions

  Consider the program:

           (define x 42)

  If the application asks for the value for `x' then clearly it
  should get back the integer `42'.

  What if the application asks for the value of `y'?

  One idea is to give every otherwise-undefined name a default
  value.   The value of `y' might be `#undefined', for example.

  Another idea is to rule it an error to ask for the value for
  `y'.   The application asking for `y' just gets an error code
  back.

  The first idea, `#undefined', seems no less general than the
  second.   An application can choose to interpret `#undefined'
  as an error.

  The first idea is more "economical" in terms of concepts.  That's
  because `#undefined' is an ordinary value: there's no need to add
  a separate notion of "error" to the language.

  Finally, the first idea, `#undefined', is also more expressive and
  more general than the second because `#undefined' can be used
  in lots of places.   For example:

        (define kreatures '(beatles byrds #undefined))

  If the application asks for the value of `favorites', it gets
  a sequence:

        (beatles byrds #undefined)

  If it asks for the first or second element of that sequence, it
  gets a symbol (`beatles' or `byrds').   If it asks for the third
  element, it gets the value `#undefined'.

  Is that useful?  Beats me.   But it sure is an easy to implement
  and semantically simple solution.   I can't see any serious problem
  with it.   Prohibiting would be more work than it's worth.

  One possible drawback to this is that these two programs:
z
        (define x 42)
        (define y #undefined)

  and 

        (define x 42)

  are indistinguishable.  The assignment to y of `#undefined' is,
  in essense, a noo.   I don't see any serious problem with that.


** Unrepresentable Values

  Consider this program:

        (define x 403291461126605635584000000)

  that's apparently a perfectly valid assignment of an integer to `x'
  but what does it mean?  The particular number in question here is
  too large to represent as a C integer (of any sort) on most
  machines.

  Some design possibilities here are:

  ~ Best Effort (Static)

    That expression is valid in some implementations (those
    with large number support) and invalid in others.  In 
    implementations where the number is invalid, it should be
    handled like a syntax error: detected and reported when the
    program is first loaded.

    The problem with that is that a generic tool, one which
    doesn't need the value of `x', can't process this 
    file unless it has large number support built in.
    That suggests another possibility:


  ~ Best Effort (Dynamic)

    The large integer is valid in implementations with 
    large number support.   In other implementations, it
    is valid if the application doesn't ask for its value, 
    invalid otherwise.

    This is better but subtly confused:  we've  confused the
    ability to represent an integer as an arithmetic type
    in the host environment with the ability to represent an
    integer generally.   It may seem obscure but, really, 
    applications don't necessarily need to be able to perform
    arithmetic with a number in order to make use of it.

    That suggests, finally:


  ~ All Numbers Valid

    Finally, we could just require implementations to 
    treat the definition of `x' as a valid number and
    return it as such to the application.   We don't
    need to require large number support for that:  
    in an implementation without large numbers, the
    value returned is just a (mostly) opaque "number"
    that happens not to be convertable to `int' 
    or `long'.


  That last choice seems to be the cleanest.  Fortuantely,
  it works out nicely in an API:



** API: Representiing Values

  [Here and elsewhere, API types and functions are just "sketched"
   rather than rendered down to precise C.]

  Some values require dynamically allocated memory for their
  representation (e.g,. a sequence).

  In general, the values in a program are "tree structured" -- one
  value may contain others but, collectively, the values form a tree.
  That is: there are no circular data structures possible in this
  language.

  Therefore:  the API has to include provisions for memory management.
  The API _might_ want to take advantage of the tree-structured nature
  of values.

  The "lifetimes" of values (i.e., when they should be freed relative
  to when they are allocated) really depends on what the application 
  is doing.    Consider a program like:

        (define kreatures '(beatles byrds #undefined))

  Suppose the application asks for the value of `kreatures' and
  then later asks for the first value in that sequence (`beatles').

  Some possibilities here are:

  ~ Free Everything at Once
  
    All of the values live equally long.  The application must later
    free the results from the entire program.  At that point, both
    `beatles' and the sequence `(beatles byrds #undefined)' would both
    be freed.

    That's simple as can be but strikes me as short-sighted.   What 
    happens if the program is very large but the application only
    wants one or two small parts of the values it contains?   A large
    amount of memory is then held to protect a very small number of
    values.

  ~ Free by Name

    Another idea is that the appliation can free every named binding
    independently.   When the application says "free the binding of
    `x'" that has the side effect of freeing `beatles'.

    This idea only partially solves the problem of "Free Everything at
    Once".   Furthermore, although we don't _currently_ allow two
    variables to be bound to a common value, it would be premature
    to rule out that possibility.   Overall, this seems unworkable.

  ~ Reference Counted Values

    That values are "tree structured" suggests that reference counting
    should be used to manage them.   If the application frees the
    sequence bound to `x', but still has a reference to `beatles',
    then `beatles' is not yet freed.

    The only problem with this idea is that it precludes the
    possibility of circular structures.   Nothing we have in 
    this language (yet) can create a circular structure -- bit would
    it be premature to preclude them in the API?

  ~ Graph-tracing Garbage Collection

    The fully general solution is to use generic, graph-tracing 
    garbage collection for values.   This approach removes all
    of the limitations of the alternatives and is clearly the
    most flexible answer.

  You might think I will therefore choose "Graph-tracing Garbage
  Collection".   I won't.

  There are two serious problems with graph-tracing garbage
  collection:

  1. It's hard to use in C.

     Interfaces to GC in C are notoriously awkward, confusing, 
     and or error-prone.

  2. On large and dynamic data sets, GC performance is hard to control

  Above all: the original goal of this language design was to make a
  very simple system for processing config files.  It should be simple
  to implement and simple to use.   Reaching the conclusion that a
  graph-tracing GC is needed for such applications seems a bit absurd,
  doesn't it?   If we decide we need graph-tracing, probably we made
  a mistake earlier in the design :-)

  So: I picked "reference counted values", thus imposing an
  interesting constraint on our value types as the langauge evolves.

  A sketch, glossing over the precise types used and how errors are 
  reported:


        enum xl_type
        {
          xl_symbol,
          xl_number,
          xl_string,
          xl_sequence,
          xl_assoc,
        };

        typedef <unspecified> Value;

        Value xl_make_symbol (char * name);
        Value xl_make_string (char * name);
        Value xl_make_number (char * name);
        Value xl_int_to_number (int n);
        Value xl_double_to_number (double n);
        Value xl_make_sequence (int n_elts, Value * elts);
        Value xl_make_assoc (int n_elts, Value * names, Value * values);

        enum xl_type xl_typeof (Value v);

        char * xl_as_string (Value v);
        int xl_as_int (Value v);
        double xl_as_double (Value v);
        Value xl_sequence_ref (Value v, int n);
        Value xl_assoc_ref (Value v, Value name);

        int xl_equal (Value a, Value b);

        xl_ref (Value v);
        xl_unref (Value v);

        Value xl_parse_program (char * source);
        Value xl_evaluate (Value program, Value name);


  I claim that implementing our language (which is evidently called
  "xl") is extremely straightforward.  Using it, with an interface
  like the above, is a breeze.


  For debugging assistence, if nothing else, we should also add 
  a printer:

        xl_write_value (int fd, Value v);

  and reader:

        Value xl_read_value (int fd, Value v)

  In fact, on the basis of those and the other functions, one could
  write a portable implementation of `xl_parse_program'.

  Has (modulo implementation) the generic configuration languaeg been 
  defined?  Stay tuned....



* Appendix: Current xl Grammar

  The configuration language so far has the grammar:


        <program> := <assignment>*
        <assignment> := "(" "define" <name> <value> ")"

        <name> := <initial> <subsequent>*
        <initial> := <letter>
        <subsequent> := <letter> | "-" | <digit>

        <value> := <number-exp> 
                | <symbol-exp>
                | <string-exp>
                | <sequence-exp>
                | <assoc-exp>
                | #undefined



        <nested-value> := <number-exp>
                       | <name>
                       | <string-exp>
                       | <sequence>
                       | <assoc>


        <number-exp> := ``a number in "the usual" notation''

        <symbol-exp> := "'" <name>
                     | "(" "quote" <name> ")"

        <string-exp> := ``a string in "the usual" quoted notation''

        <sequence-exp> := "'" <sequence>
                       |  "(" "quote" <sequence> ")"

        <sequence> := "(" <nested-value>* ")"

        <assoc-exp> := "'" <assoc>
                    |  "(" "quote" <assoc> ")"

        <assoc> := "{" <assoc-binding>* "}"

        <assoc-binding> := <name> "=>" <nested-value>
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Gnu-arch-users] Re: new language, arch, furth, etc., (continued)
- [Gnu-arch-users] Re: new language, arch, furth, etc., Tom Lord <=
Prev by Date: Re: [Gnu-arch-users] Status of global and tree aliases
Next by Date: Re: [Gnu-arch-users] new language, arch, furth, etc.
Previous by thread: Re: [Gnu-arch-users] Re: new language, arch, furth, etc.
Next by thread: [Gnu-arch-users] Status of global and tree aliases
Index(es):
- Date
- Thread