help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Flex recognition of "split" keywords


From: Hans Aberg
Subject: Re: Flex recognition of "split" keywords
Date: Sun, 20 Jul 2003 19:12:59 +0200

At 10:55 -0400 2003/07/20, John R Levine wrote:
>> Any other Fortran quirks coming to your mind?
>
>Many.  The unusual problem with Fortran is that the token boundaries are
>context dependent.
...
[examples]
...
>I remember that when I wrote INfort (one of the first f77 compilers,
>contemporary with f77) I had at least 12 different lexer kludges, but I
>don't remember now, 25 years later, what they all were.

So also you found it better to write a lexer for a different language. :-)

I looked a bit at your examples, and it makes me wonder why somebody would
invent such a language: Especially the feature where spaces are prescanned
out. What would the advantage of it be? It surely does not help human
parsing.

Otherwise, mild context dependancies are not difficult to handle in the at
the lexer parser level, by the use of Flex start conditionms or by some
grammar twirks. For example:

>       a = 10 e2               // 10e6 is a number
>
>       do 10 e2 = 1,10 e3      // 10 is a statement number, e6 is
>                               // a name, 10e3 is a number
(Your comments are using e6 instead of e2. :-) )

I ended up admitting numbers as well as identifiers for as statement
identifiers, because it is common to use numbers when numbering say
prooflines in a formal proof. The way I solved it is by using Flex rules
(example simplified):
[[:digit:]]+ { get_text; record_value; return unsigned_integer_value; }
[[:alpha:]]+ { get_text; return identifier_key; }
[[:alnum:]]+ { get_text; return label_key; }

It is then possible for the parser to select the integer value text in the
case of a statement identifier.

A similar method might be used in the example above, i.e., the parser would
get the tokens "10" and "e2" and decide whether they should be a number or
two separate tokens. For example:
[[:digit:]]+    { get_text; record_value; return unsigned_integer_value; }
"e"[[:digit:]]+ { get_text; record_value; return unsigned_exponent_value; }
[[:alnum:]]+    { get_text; return label_key; }

The .y grammar might contain:

do_stuff:
   "do" do_label {}
;
do_label:
    unsigned_exponent_value { $$.text_ = $1.text_; }
  | label_key { $$.text_ = $1.text_; }
;

This way the "contexts" are sorted out by the parser, not the lexer.

  Hans Aberg






reply via email to

[Prev in Thread] Current Thread [Next in Thread]