[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: enum instead of #define for tokens

From: Hans Aberg
Subject: Re: RFC: enum instead of #define for tokens
Date: Sat, 6 Apr 2002 00:01:11 +0200

At 11:54 -0800 2002/04/05, Paul Eggert wrote:
>> Is this cross compiler problem common?
>It depends on what you mean by "common".  If you use EBCDIC it's
>common.  If you use the non-ASCII part of ISO 8859-1 and are
>collaborating with someone else who's using some other character set
>in the ISO 8859 series, it's common.  Assuming Bison supports
>multibyte character sets properly (isn't that how we got started on
>this thread?), a similar problem occurs with the non-ASCII parts of
>UTF-8, EUC-JP, shift-JIS, etc.

The point is, if you compile it on Bison one platform, and then transport
the output sources to another. But the problem may show up anyway
somewhere. -- One ends up with questions that ultimately has to do with the
failings of C/C++, not Bison.

>> -- Note that the problem does not exist for Unicode UTF-n encodings
>Only if everyone agrees to use that particular extension to ASCII.

I'm not sure what you mean here: If Bison has a Unicode feature to be
turned on, then that will work only for Unicode UTF-n, n >=21, streams, but
they will agree on any platform; the compacted yytranslate[] table will be
the same on any platform. Further, Linux evidently already using those
UTF-32, so as far as GNU is concerned, it should be a non-issue.

It is probably only backwards MSOS that uses UTF-16; but that ain't GNU. If
one uses UTF-16 and not symbols requiring more than one 16-bit binary
character, then the yytranslate[] table will be the same as of UTF-n, n >=

>> Note that one may want to use the yytranslate[] table as is if one is using
>> distributed programming, say a WWW-browser reading ASCII on an EBCDIC
>> computer.
>Yes, that's the sort of scenario I was worried about.

But here it is a desirable feature: Only compile the sources with Bison on
the ASCII platform, and it will compile correctly on the EBCDIC computer.
The alternative would be to write sources like
  char ASCII_a = 0x41;
and then handwrite the lexer using that. This is what a guy writing a WWW
server told me he was doing. -- Extremely painful.

One ends up the question of defining which encodings the parser and lexer
should be able to handle.

Under C++, this can be done by hooking onto a code converter on the IO
streams. Thus, if one decides to settle for Unicode UTF-n, n >= 21,
internally in Flex/Bison, then the generated combined lexer/parser can be
made to parse any encoding by invoking the platform specific code
converter: Just compile the sources on say a Linux machine, which does it
correctly, and the compacted yytranslate[] table will be correct for
Unicode. On another platform, then invoke the local code converter from the
favorite format to Unicode UTF-n.

  Hans Aberg

reply via email to

[Prev in Thread] Current Thread [Next in Thread]