[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Compiling Reverse Polish Calculator Example
From: |
Hans Aberg |
Subject: |
Re: Compiling Reverse Polish Calculator Example |
Date: |
Thu, 1 Nov 2012 20:54:26 +0100 |
On 1 Nov 2012, at 18:34, Akim Demaille wrote:
> Hi Hans,
Hi Akim,
> Le 31 oct. 2012 à 15:47, Hans Aberg a écrit :
>
>> It is pointless in UTF-8, and accepting it encourages a number of other
>> problems.
>> https://en.wikipedia.org/wiki/Byte_order_mark
>
> You are right that Bison wants at least to be able to read
> the ASCII part of the 8 bits, so that sort-of means UTF-8,
> if we consider that Latin 1 and the like are dead.
Isn't Bison strictly speaking just reading 8-bit ASCII, forwarding whatever is
in strings, in view of that not all byte (octet) sequences are UTF-8? Modulo
diagnostics in foreign languages.
> If we were to ignore the BOM, then at least we should check
> that they match UTF-8, and reject the file otherwise?
Perhaps issuing a better error that this is BOM. It shows up (as mentioned in
the URL above) for some stray editors that add it by default, so it might be
better to prohibit it, to encourage user to set their editors right.
> FWIW, the D compilers for instance obey these BOM, including for
> other codings than UTF-8.
As for UTF-8 and POSIX/UNIX, I think its use is strongly discouraged, for
various reasons. A philosophical one is that UTF-8 was originally invented to
be a Unix encoding, which is largely handles text-streams, and a BOM is a
contextual marker (as it must be first in the stream), that breaks that idea.
Hans