help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: those funny non-ASCII characters


From: rusi
Subject: Re: those funny non-ASCII characters
Date: Fri, 1 Jun 2012 09:26:08 -0700 (PDT)
User-agent: G2/1.0

On Jun 1, 12:03 pm, Xah Lee <address@hidden> wrote:
> On May 31, 10:43 pm, rusi <address@hidden> wrote:
>
> > On Jun 1, 9:23 am, Jason Rumney <address@hidden> wrote:
>
> > > On Thursday, 31 May 2012 01:15:11 UTC+8, Buchs, Kevin  wrote:
> > > > Xah suggested I embrace Unicode. So I could use (prefer-coding-system
> > > > 'utf-8) or the file variable: -*- coding: utf-8 -*-. Are there drawbacks
> > > > to the former? What about opening an ASCII coded file? Can emacs
> > > > properly detect it or does it come up as UTF-8?
>
> > > ASCII is a subset of UTF-8, so the problem you are imagining does not 
> > > exist.
>
> > This does not exactly work that way on windows.
> > eg recently saw a description of how notepad put a BOM mark in a
> > haskell-script which made the haskell scripts unrunnable
>
> haskell compiler probably should bear the blame. Last i read (~4 years
> ago), the lang spec says source code should be unicode (i forgot if it
> specified a encoding), however, no haskell compiler at the time
> supports it. If your lang spec says unicode, you have to support BOM
> mark.
>
> 〈Unicode BOM Byte Order Mark 
> Hack〉http://xahlee.org/comp/unicode_BOM_byte_orde_mark.html
>
> http://www.unicode.org/faq/utf_bom.html#bom1
>
>  Xah

See http://www.unicode.org/versions/Unicode5.0.0/ch02.pdf
(pg 36) "Use of a BOM is neither required nor recommended for UTF-8,
but may
be encountered in contexts where UTF-8 data is converted from other
encoding forms..."

More specifically the non-recommendation of bom: 
http://www.unicode.org/faq/utf_bom.html
"Note that some recipients of UTF-8 encoded data do not expect a BOM.
Where UTF-8 is used transparently in 8-bit environments, the use of a
BOM will interfere with any protocol or file format that expects
specific ASCII characters at the beginning, such as the use of "#!" of
at the beginning of Unix shell scripts. "


reply via email to

[Prev in Thread] Current Thread [Next in Thread]