help-flex
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Locations suggest


From: Akim Demaille
Subject: Re: Locations suggest
Date: 03 Jan 2002 15:48:09 +0100
User-agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Common Lisp)

>>>>> "Hans" == Hans Aberg <address@hidden> writes:

Hans> At 16:30 -0800 2001/12/31, John W. Millaway wrote:
>>> One should perhaps equip Flex with better token location
>>> recording, for use with error messages, etc:
>> Good idea.

Hans> Thank you. :-)

I think it is not a good because there are too many different ways
locations have to be computed.  And yet for the sole yylineno, flex is
not satisfying.  Rather, you should depend on the various hooks to
have your scanner compute what you need.  See the following.

----------------------------------------------------------------------

Advanced Use of Flex
--------------------

   In this section we will develop a scanner for arithmetics, which will
later be used together with a Bison generated parser to implement an
alternative implementation of M4's `eval' builtin, see Bison, ylparse.y
(*FIXME:* Ref Bison, ylparse.y.).  Our project is composed of:

`yleval.h'    a header common to all the files,
`ylscan.l'    the scanner for arithmetics
`ylparse.y'   the parser for arithmetics (*FIXME:* ref.).
`yleval.c'    the driver for the whole module (*FIXME:*
              ref.).

   Because locations are extremely important in error messages, we will
look for absolute preciseness: we will not only track the line and
column where a token starts, but also where it ends.  Maintaining them
by hand is tedious and error prone, so we will insert actions at
appropriate places for Flex to maintain them for us.  We will rely on
Bison's notion of location:

     typedef struct yyltype
     {
       int first_line, first_column, last_line, last_column;
     } yyltype;

which we will handle thanks to the following macros:

 - Macro: LOCATION_RESET (LOCATION)
     Initialize the LOCATION: first and last cursor are set to the
     first line, first column.

 - Macro: LOCATION_LINE (LOCATION, NUM)
     Advance the end cursor of NUM lines, and of course reset its
     column.  A macro `LOCATION_COLUMN' is less needed, since it would
     consist simply in increasing the `last_column' member.

 - Macro: LOCATION_STEP (LOCATION)
     Move the start cursor to the end cursor.  This is used when we
     read a new token.  For instance, denoting the start cursor `S' and
     the end cursor `E', we move from

                1000 + 1000
                ^  ^
                S  E

     to

                1000 + 1000
                   ^
                  S=E

 - Macro: LOCATION_PRINT (FILE, LOCATION)
     Output a human readable representation of the LOCATION to the
     stream FILE.  This hairy macro aims at providing simple locations
     by factoring common parts: if the start and end cursors are on two
     different lines, it produces `1.1-2.3'; otherwise if the location
     is wider than a single character it produces `1.1-3', and finally,
     if the location designates a single character, it results in `1.1'.

   Their code is part of `yleval.h':

     /* Initialize LOC. */
     # define LOCATION_RESET(Loc)                  \
       (Loc).first_column = (Loc).first_line = 1;  \
       (Loc).last_column =  (Loc).last_line = 1;
     
     /* Advance of NUM lines. */
     # define LOCATION_LINES(Loc, Num)             \
       (Loc).last_column = 1;                      \
       (Loc).last_line += Num;
     
     /* Restart: move the first cursor to the last position. */
     # define LOCATION_STEP(Loc)                   \
       (Loc).first_column = (Loc).last_column;     \
       (Loc).first_line = (Loc).last_line;
     
     /* Output LOC on the stream OUT. */
     # define LOCATION_PRINT(Out, Loc)                               \
       if ((Loc).first_line != (Loc).last_line)                      \
         fprintf (Out, "%d.%d-%d.%d",                                \
                  (Loc).first_line, (Loc).first_column,              \
                  (Loc).last_line, (Loc).last_column - 1);           \
       else if ((Loc).first_column < (Loc).last_column - 1)          \
         fprintf (Out, "%d.%d-%d", (Loc).first_line,                 \
                  (Loc).first_column, (Loc).last_column - 1);        \
       else                                                          \
         fprintf (Out, "%d.%d", (Loc).first_line, (Loc).first_column)


     *Example 6.14*: `yleval.h' (i) - Handling Locations



   Because we want to remain in the `yleval_' name space, we will use
`%option prefix', but this will also rename the output file.  Because
we use Automake which expects `flex' to behave like Lex, we use
`%option outfile' to restore the Lex behavior.

     %option debug nodefault noyywrap nounput
     %option prefix="yleval_" outfile="lex.yy.c"
     
     %{
     #if HAVE_CONFIG_H
     #  include <config.h>
     #endif
     #include <m4module.h>
     #include "yleval.h"
     #include "ylparse.h"

     *Example 6.15*: `ylscan.l' - Scanning Arithmetics



   Our strategy to track locations is simple, see *Note Flex Actions::.
Each time `yylex' is invoked, we move the first cursor to the last
position thanks to the USER-YYLEX-PROLOGUE.  Each time a rule is
matched, we advance the ending cursor of `yyleng' characters, except
for the rule matching a new line.  This is performed thanks to
`YY_USER_ACTION'.  Each time we read insignificant characters, such as
white spaces, we also move the first cursor to the latest position.
This is done in the regular actions:

     /* Each time we match a string, move the end cursor to its end. */
     #define YY_USER_ACTION  yylloc->last_column += yyleng;
     %}
     %%
     %{
       /* At each yylex invocation, mark the current position as the
          start of the next token.  */
       LOCATION_STEP (*yylloc);
     %}
       /*  Skip the blanks, i.e., let the first cursor pass over them.  */
     [\t ]+     LOCATION_STEP (*yylloc);
     \n+        LOCATION_LINES (*yylloc, yyleng); LOCATION_STEP (*yylloc);

The case of the keywords is straightforward and boring:

     "+"        return PLUS;
     "-"        return MINUS;
     "*"        return TIMES;
     ...

Integers are more interesting: we use `strtol' to convert a string of
digits into an integer.  The result is stored into the member `number'
of the variable `yylval', provided by Bison via `ylparse.h'.  We
support four syntaxes: `10' is decimal (equal to... 10), `0b10' is
binary (2), `010' is octal (8), and `0x10' is hexadecimal (16).  Notice
the risk of reading `010' as a decimal number with the naive pattern
`[0-9]+'; you can either improve the regular expression, or rely on the
order of the rules(1).  We chose the latter.

       /* Binary numbers. */
     0b[01]+   yylval->number = strtol (yytext + 2, NULL, 2); return NUMBER;
       /* Octal numbers. */
     0[0-7]+   yylval->number = strtol (yytext + 1, NULL, 8); return NUMBER;
       /* Decimal numbers. */
     [0-9]+    yylval->number = strtol (yytext, NULL, 10); return NUMBER;
       /* Hexadecimal numbers. */
     0x[:xdigit:]+ yylval->number = strtol (yytext + 2, NULL, 16); return 
NUMBER;

   Finally, we include a catch-all rule for invalid characters: report
an error but do not return any token.  In other words, invalid
characters are neutralized by the scanner:

       /* Catch all the alien characters. */
     .   {
           yleval_error (yycontrol, yylloc, "invalid character: %c", *yytext);
           LOCATION_STEP(*yylloc);
         }
     %%

where `yleval_error' is a variadic function (as is `fprintf') and
`yycontrol' a variable that will be both defined later.


   This scanner is complete, it merely lacks its partner: the parser.
But this is yet another chapter...

   ---------- Footnotes ----------

   (1) Note that the two solutions proposed are _not_ equivalent!  Spot
the difference between

     [1-9][0-9]*  return NUMBER;
     0[0-7]+      return NUMBER;

and

     [0-9]+    return NUMBER;
     0[0-7]+   return NUMBER;



reply via email to

[Prev in Thread] Current Thread [Next in Thread]