[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: enum instead of #define for tokens

From: Anthony DeRobertis
Subject: Re: RFC: enum instead of #define for tokens
Date: 04 Apr 2002 13:20:14 -0500

On Thu, 2002-04-04 at 05:19, Akim Demaille wrote:

> IMHO, my position is not nice wrt people who are abusing the system.
> The example of Unicode demonstrates how bad it was to let chars be
> tokens.  That default is very C specific, I really doubt that in other
> languages, such an atrocity remains in their native Yaccs.  But I
> confess I don't know.

Unicode really doesn't disagree with having tokens that happen to be in
U+0000 to U+00FF. It doesn't disagree with having some simple way of
representing those, either. We could very well call the signle-quotes in
%token '=' the Unicode big-endian low-octect token creator operator. The
acronym is quite interesting ;-)

Things like %token '=', %token ',', etc. are really usefull for
programming languages. I agree that if people want some weird
native-language equals sign (is there such a thing?) or one of the nice
mathematical symbols, they should have their scanner deal with that.

I find rules like:
        assignment:  lvalue '=' rvalue;
to be pretty clear and terse, while:
        assignment:  lvalue EQUALS rvalue;
to be lacking at least in the terseness value. I also fear people would
start using EQ or EQA. Same with other common ones; wonder what type of
butchery will be done of 'left parenthesis', 'asterick'(sp?), and
'multiplication sign'?

> Not quite.  I'm imposing the theory that goes under the scene.

Theoretically, token numbers are irrelevant. Practically, table size
matters ;-)

> All this discussion is anyway not taking into account the impact that
> Unicode can have on the size of the Bison tables.  From the theory
> point of view, I'm very much against Unicodization, from the practical
> point of view, I'm not even sure it is doable.  And most importantly,
> I'm sure that if it's done in the scanner, these problems vanish.

Here we agree fully. I'd also like to note that the $&^@(address@hidden at the
Unicode Consortium failed to provide a fixed-width character

As far as I've been able to tell, even UCS-4 isn't. It's pretty close
though, which is why I guess it isn't Unicode :-(

reply via email to

[Prev in Thread] Current Thread [Next in Thread]