help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: improving yysyntax_error()


From: Hans Aberg
Subject: Re: improving yysyntax_error()
Date: Thu, 21 Jun 2007 13:12:40 +0200

On the one hand, you try to use Bison for something it wasn't designed for, so unless to can come up with good motivations, getting the change is unlikely to happen. Remember that the development is done by volunteers that do what they want.

On the other hand, one way to extend Flex & Bison to Unicode is to use UTF-8, and let Flex return a sequence of characters. Then this can be used in Bison, too, if 'c_1...c_k' expands to 'c_1'...'c_k'. Then the problem is to generate the full UTF-8-character in error messages, not just the leading byte. I do not know how to implement this, though.

  Hans Aberg


On 20 Jun 2007, at 23:13, Christian Schoenebeck wrote:

Hi!

I would like to improve the quality of error messages produced by
yysyntax_error(). I know the theory behind LALR(1) parsers, but unfortunately I'm not very used to the bison skeleton parser implementation yet, so I hope
you can help me a bit.

First the reason: I'm strictly opposed to having split the lexer and parser tasks into two distinct worlds (due to many reasons). So my parsers usually work like this: yylex() just returns the ASCII code of the next character
from the input stream and thus the bison grammars include the typical,
trivial lexer-side rules, i.e. like:

SET : 'S''E''T' ;

SUBSCRIBE : 'S''U''B''S''C''R''I''B''E' ;

Now the problem is, if there's a syntax error within these trivial rules, the yysyntax_error() function will just reflect the next expected character. I.e.
the input "SUBfoo" would result in the error message: "syntax error,
unexpected 'f', expecting 'S'". Obviously, returning the whole rule's symbol name would make more sense here, that is: "syntax error, unexpected 'SUBfoo',
expecting 'SUBSCRIBE'".

So I thought about adding a new keyword to the bison declaration section,
i.e. "%atomic" like:

%atomic SET SUBSCRIBE

Which would tell bison, that the rules of the listed non-terminal symbols are so trivial, that they don't matter in i.e. error messages and accordingly, yysyntax_error() would reflect the expected non-terminal symbol name (s),
instead of the expected next character.

What do you think about that suggestion in general?

To implement this, is there an easy way (i.e. by table lookup) in the bison
skeleton parser to retrieve the rule numbers of the expected upcoming
reduction(s) for a given parser state? For example "bison -v" would show me:

state 13

  457 SET: 'S' . 'E' 'T'
  458 SUBSCRIBE: 'S' . 'U' 'B' 'S' 'C' 'R' 'I' 'B' 'E'

    'E'  shift, and go to state 52
    'U'  shift, and go to state 53

thus after the input "S", it would either expect an upcoming reduction of rule 457 or 458. After having the rule numbers it's easy to resolve the human readable symbol names in the skeleton, but so far I'm a bit unoriented how to
get the rule numbers of the upcoming reductions. Any hints?

CU
Christian


_______________________________________________
address@hidden http://lists.gnu.org/mailman/listinfo/help-bison





reply via email to

[Prev in Thread] Current Thread [Next in Thread]