[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Unicode, better error reporting

From: Hans Aberg
Subject: Unicode, better error reporting
Date: Thu, 12 Sep 2002 18:53:53 +0200

I will drop off some ideas, lest I will forget about them:

- In view of Flex' way of dealing with errors, sending it to output or by a
rule "." sending its code to the Bison generated parser, Bison needs a
"Unicode" option. It would merely make sure that there are no token numbers
in the Unicode range. Unicode will never create codes outside the range 0,
..., 2^21 - 1.

- It would be great with YYERRCODE being written into the .h header so that
the Flex generated lexer can use it. Even better, one might have a
YYLEXERROR which the lexer can use: I think of it as Bison when getting
this token value then generates an error but not calling yyerror, expecting
instead the lexer writing out an error message.

- In error messages, it is annoying that one cannot get the names of the
actual text identified by the lexer written out in the case the token value
stands for many such text strings. For example,
  %token string_value "string"
would not actually write the integer, only that an unexpected "string" was

Now, it turns out to be tricky to find a general formula for such
printouts. For example, a string may be needed to be passed though an
encoding function in order to produce a nice printout. Further, different
types of tokens may need different treatments.

 - So therefore, I was led to the following idea: One is allowed to write
  %token string_value "string" { /* code */ }
Then idea is that when the Bison parser is going to write out the name of
the token, it instead executes the code snippet.

The point is that this ought to be both simple to implement and
sufficiently general.

 - An alternative way, even easier to implement, is to make an yyerror
which instead of a string (char*) as argument, makes the information
available to create a proper error message, probably token number or
failing and expected tokens, and the yytname table. One can the write ones
own error message like
  int yyerror(...) {
  switch (...) {
    case string_value: /* selected token values */
    case ...:


The advantage of this approach is that one mainly needs to edit the
skeleton file in order to achieve it.

-- Perhaps the function above should be called yyparseerror, and one then
has different functions called for different types of errors. Then it is
easy to customize it.

  Hans Aberg

reply via email to

[Prev in Thread] Current Thread [Next in Thread]