help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Flex recognition of "split" keywords


From: Hans Aberg
Subject: Re: Flex recognition of "split" keywords
Date: Sun, 20 Jul 2003 16:11:19 +0200

At 14:03 -0400 2003/07/19, John R Levine wrote:
>Actually, you need to make a prepass over each statement to decide whether
>it's an assignment or something else before you can determine token
>boundaries.
>
>       do 10 i = 1.10
>
>       do 10 i = 1,10
>
>In the first statement, do 10 i is one token, in the second statement it's
>three.  The prescan needs to see whether the statement contains an equal
>sign followed by a comma not contained in parentheses nor in a quoted or
>hollerith string.  Lex isn't very good at that kind of thing.

It might be useful to add such a thing to Flex: I have encountered similar
situations here and there. The way I resolved it with Flex is to add a pipe
of tokens; whenever the Flex lexer rescans, it check whether the pipe is
empty and make a return based on that.

One such situation is for the Bison grammar (i.e., for the Bison language
itself as the implementation for Bison itself now uses Flex and earlier
version of Bison in order to produce its lexer and parser). It admits the
";" after each set of grammar rules & actions to be omitted. Such a thing
is hard to fix with a LALR Grammar that Bison uses. Thus such a pipe might
be better, scanning forward to see if a virtual ";" should be inserted.

I am thinking about another example right now, when I write: I want to
implement set notation in a theorem verifier based on metamathematics I am
writing on. Then I need to distinguish between
 (1)   {x| A}
 (2)   {x, y}
The problem is that (1) is a binder that bind just a name "x" thus creating
a bound variable, whereas in (2), "x" must be an already declared variable
or constant. It is legal to have constructs such as
    free x, y  formula A.
      ... |- ... all z: z in {x, y} and z in {x| A}.
thus in the first occurrence of "x" is a free variable and in the second, a
variable bound to the set expression.

Thus, it seems that I have to scan forward to see whether the token after
the "x" in (1) or (2) is a "|" or a "," or a "}". If it is a "|", I will
know that "x" is name, or if it is a "," or a "}", then it is a the
variable declaration that will become the valid token return type.

>> Perhaps it is implementable in Flex (if now Flex would want to support that
>> language :-)):
>
>I have no doubt that by adding enough C code you can do some of the
>scanning in lex, but the total amount of code would be larger than an
>ad-hoc scanner.

I am thinking more of pinning down a set of features that might be of
general interest to other, non Fortran implementors, that might help to
make that C code (largely) unnecessary.

The idea of prescan falls into a general pattern: One usually uses a lexer
and a scanner, which from the formal point of view are both parsing
grammars. They are hooked up in a pipe so that the sentences of the lexer
language become the words of the parser language. The prescan hooks up
another language parse before the lexer, and its sentences becomes the
words of the lexer language.

And the ability to scan forward tokens in a pipe, then based on that
determine which tokens to return seem to happen every once in a while. It
should at least be easy to implement with a Flex scanner, I think.

Any other Fortran quirks coming to your mind?

  Hans Aberg






reply via email to

[Prev in Thread] Current Thread [Next in Thread]