lilypond-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 in MIDI Lyrics


From: Joseph Austin
Subject: Re: UTF-8 in MIDI Lyrics
Date: Sat, 25 Feb 2017 13:45:15 -0500


On Feb 25, 2017, at 11:41 AM, address@hidden wrote:

Date: Sat, 25 Feb 2017 17:34:54 +0100 (CET)
From: Karl Hammar <address@hidden>
To: Joseph Austin <address@hidden>
<snip>

And,  rp26 clearly states in section 5:

In addition, if a byte order mark which specifies UNICODE such as
'FF FE' or 'FE FF' exists, the character code SET should be treated
 as UNICODE.

There is such a "byte order mark" for utf8, see [2]. And then by
extension, you just have to insert that BOM somewhere in the midi
file (exists == not restricted to the lyrics meta event, preferable
in track 0 at time 0) and it would be legal (according to the
recommendation) to use utf8 straigth out the box.

[2] http://www.unicode.org/faq/utf_bom.html#BOM

<snip>

only ASCII chars between 0 and 127 are allowed.

Your wording is too hard. complete_midi_96-1-3.pdf, p.137 (or [1] 
p.10) clearly says "should", but 

"other characters codes
using the high-order bit may be used for interchange of files between
different programs on the same computer which supports an extended
character set. Programs on a computer       which  does not support
non-ASCII characters should ignore those characters."

I stand corrected.

But if we are going to use a "private standard", we might as well
imitate the "official" standard and insert something like
FF 05 07 { @ U T F 8 }
And lobby AMEI/MMA to adopt an official UTF8 position.

Could be good, but why just not capitalize on the BOM and just use
utf8.

Regards,
/Karl Hammar

OK, the UTF-8 BOM is 0x EF BB BF
But given that the MIDI file is not a "text file" but a binary file with text fields scattered throughout,
normally embedded in various MIDI Meta-events, where should the BOM be placed?

Interpreting your suggestion, we could add a Lyric Meta-Event with the BOM as the text field to Track 0 Time 0.  
That should work for lyrics, but RP-26 indicates that lyrics "language encoding" should not extend to other types of text events.
For other text events, it seems we would need to prefix every UTF-8 text field with the BOM.
---
Joe Austin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]