[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: enum instead of #define for tokens

From: Akim Demaille
Subject: Re: RFC: enum instead of #define for tokens
Date: 04 Apr 2002 12:19:04 +0200
User-agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Common Lisp)

>>>>> "Hans" == Hans Aberg <address@hidden> writes:

>> My position, probably not very nice, is that this should not
>> happen.  Passing chars (wchars) as tokens is wrong.  It was a nice
>> little dirty trick to be able to `return '+'' in the scanner, and
>> use '+' too in the parser, but that's not sane.  The parser should
>> never see characters.

Hans> You are right, this is not very nice :-):

IMHO, my position is not nice wrt people who are abusing the system.
The example of Unicode demonstrates how bad it was to let chars be
tokens.  That default is very C specific, I really doubt that in other
languages, such an atrocity remains in their native Yaccs.  But I
confess I don't know.

Hans> I think you are imposing your own programming style here.

Not quite.  I'm imposing the theory that goes under the scene.

Hans> I tweaked my bison.simple file so that when it encounters an
Hans> unknown character (known as a character by its range), it writes
Hans> it out, instead of just saying "undefined". One then can make
Hans> full use of the Flex . { return (unsigned char)yytext[0]; }
Hans> rule.

Scanning errors ought to be caught by the scanner, not the parser.

Hans> Very convenient: One spin-off is that one gets access to the
Hans> error reporting system of the Bison parser also for such
Hans> characters.

If you want Bison to give an access to the $undefined token, I'm ready
to do that.  Then the scanner may return this token.  And nothing
prevents this token from having a value: the string, which can be used
in error messages.

>> As a result, there is no such issue as a Unicode compliant parser.

Hans> Bison is already Yacc "char" compliant, starting at 257. So I
Hans> think there should be a corresponding Unicode feature: Unicode
Hans> has so many characters, that one needs a convenient way of
Hans> handling them.

All this discussion is anyway not taking into account the impact that
Unicode can have on the size of the Bison tables.  From the theory
point of view, I'm very much against Unicodization, from the practical
point of view, I'm not even sure it is doable.  And most importantly,
I'm sure that if it's done in the scanner, these problems vanish.

Yacc and Lex, as Parsers and Scanners, struggled from a clean
separation of the two different tasks.  We should keep these task

Note that this is very different from referring the input formalism.
We can very well imagine a FlexBison that input a single file for both
the scanner and the parser.  But still there would be a parser and a
scanner in the output.  So called scannerless parsers do have a
separation somewhere between lexical and syntactic.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]