chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] BOM in a Scheme source file


From: John Cowan
Subject: Re: [Chicken-users] BOM in a Scheme source file
Date: Sun, 9 Sep 2007 21:24:59 -0400
User-agent: Mutt/1.5.13 (2006-08-11)

Shawn Rutledge scripsit:

> Instead, you think Scite should assume that when it sees any bytes
> with the MSB set, the file is UTF-8?  Or there is a better way to
> detect it?

There is no *guaranteed correct* way to detect UTF-8, because a Latin-1
(or various other 8859-x encodings) file can contain any possible
sequence of bytes.  Looking at part of the input to see if it contains
only UTF-8-valid byte sequences is a good heuristic; looking at the
first few bytes of the input to see if it contains a BOM is a better one.

> Vim recognizes the UTF-8 sequences correctly with or without the BOM;
> and if I save the file, it will preserve the BOM if it was there
> initially, but will not add the BOM if it was absent.

A good heuristic, but there must exist cases that can deceive it.

> It would be nice if Chicken was tolerant as well, since the BOM is so common.

+1

-- 
"Well, I'm back."  --Sam        John Cowan <address@hidden>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]