Re: BOM mark from Windows notepad

lilypond-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: BOM mark from Windows notepad

From:	Hans Aberg
Subject:	Re: BOM mark from Windows notepad
Date:	Fri, 13 Nov 2009 14:09:43 +0100

On 13 Nov 2009, at 12:25, Bertalan Fodor (LilyPondTool) wrote:

Hehe, we've got this:

<INITIAL,chords,lyrics,figures,notes>{BOM_UTF8}/.* {
if (this->lexloc->line_number () != 1 || this->lexloc->column_number() != 0)
  {
    LexerError (_ ("stray UTF-8 BOM encountered").c_str ());
    exit (1);
That means, we correctly parse the BOM, but exit if it is not thefirst char.

This link says that, though there is no Unicode protocol for those ofBOMs, they suggest to treat them as a zero-width space (the non-breaking part is not relevant here):

  http://unicode.org/faq/utf_bom.html#bom6

So to follow that suggestion, that error-code should be removed, ifyou now want to admit BOMs.


  Hans

Hans Aberg wrote:
On 13 Nov 2009, at 10:08, Bertalan Fodor (LilyPondTool) wrote:
I think changing the LilyPond parser to support BOM in the middle(ie not at the beginning) of the file is very hard. Actually if itis not at the beginning, then it should be treated as a regularcharacter, which might not be present just anywhere in the file.
Why would that be? Did you not have a Flex generated .l file? Ifthe input .l file is in UTF-8 and Flex in 8-bit mode, add a rule
 "<BOM>" {}
where <BOM> is the UTF-8 representation of the BOM. It will thanadd act as space, breaking tokens, but otherwise ignored. So itacts a zero-width space.
 Hans

[Prev in Thread]

Current Thread

[Next in Thread]

Re: BOM mark from Windows notepad, (continued)

Prev by Date: Re: Issue #768 - chord repetition shortcut: patch for review
Next by Date: Re: BOM mark from Windows notepad
Previous by thread: Re: BOM mark from Windows notepad
Next by thread: Re: BOM mark from Windows notepad
Index(es):
- Date
- Thread