[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Tue, 12 Feb 2002 19:04:17 +0100
[Please keep the cc to Help Bison, as more good folks can help.]
At 17:02 +0300 2002/02/12, pecherin wrote:
>>Formally, the Bison generated parser reads token numbers provided by the
>>lexer and knows nothing about character codes, so from that point of view,
>>there is nothing different form using one character encoding from another.
>Thanks. I think you are right.
>I looked at code generated by Bison and
>found a lot of char constants and it
There is another limitation, which I do not know whether you will hit, but
which you should be aware of when working with Unicode: Namely that both
Bison and the parser it generates uses a "short" for states. It means that
if you put in a lot of Unicode tokens which are parsed by different states,
you might run into an overflow.
The reason one might want to generate a lot of Unicode tokens is that the
C/C++ support for Unicode is real lousy, so those that write such Unicode
multi-compiler applications (like WWW servers/browsers) give the Unicode
characters identifier names, and write out the character codes explicitly.
This seems to be the only way to ensure portability right now.
- UNICODE, pecherin, 2002/02/12
- Re: UNICODE, Hans Aberg, 2002/02/12
- Message not available
- Re: UNICODE,
Hans Aberg <=