bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: enum instead of #define for tokens


From: Hans Aberg
Subject: Re: RFC: enum instead of #define for tokens
Date: Wed, 3 Apr 2002 19:46:57 +0200

At 19:13 +0200 2002/04/03, Akim Demaille wrote:
>| 2.  Currently Bison assumes 8-bit bytes (i.e. that UCHAR_MAX is 255).

I think this is a non-issue as nearly all machines use C/C++-bit bytes:
Those that used say 9 bits are archaic. If one uses 16-bit C/C++ bytes,
then file IO should also take place in those chunks according to the C/C++
standards.

So my compiler runs on a platform (or so a macro says) where a byte has 16
bits, but given the file IO question, I wonder if it really is conforming
on that platform.

>My position, probably not very nice, is that this should not happen.
>Passing chars (wchars) as tokens is wrong.  It was a nice little dirty
>trick to be able to `return '+'' in the scanner, and use '+' too in
>the parser, but that's not sane.  The parser should never see
>characters.

You are right, this is not very nice :-):

I think you are imposing your own programming style here.

I tweaked my bison.simple file so that when it encounters an unknown
character (known as a character by its range), it writes it out, instead of
just saying "undefined". One then can make full use of the Flex
  . { return (unsigned char)yytext[0]; }
rule.

Very convenient: One spin-off is that one gets access to the error
reporting system of the Bison parser also for such characters.

>As a result, there is no such issue as a Unicode compliant parser.

Bison is already Yacc "char" compliant, starting at 257. So I think there
should be a corresponding Unicode feature: Unicode has so many characters,
that one needs a convenient way of handling them.

  Hans Aberg





reply via email to

[Prev in Thread] Current Thread [Next in Thread]