lilypond-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

How can I avoid Latin1 and use UTF-8?


From: Hans Aberg
Subject: How can I avoid Latin1 and use UTF-8?
Date: Mon, 5 Sep 2005 12:06:39 +0200

Was: How can I avoid unicode and use Latin1?

On 5 Sep 2005, at 08:58, address@hidden wrote:

Thank you. I didn't know unicode was broader than UTF-8.

Formally, one assigns to abstract characters different non-negative integers, called in Unicode lingo "code points". In order to get this stuff into a computer, one needs an integer to binary translation function. This is what UTF-8 does. Different translation functions provide different encodings of the same code points.

The 3-byte value
10FFFF (rather than FFFFFF) seems like a rather strange upper limit,

When UTF-16 was designed, one did not think clearly about the above separations, so therefore one thought this upper limit was necessary. The limitation is though imposed by Unicode Inc.; the original ISO UTF-8 does not do that (so there are two differing versions of UTF-8 in play). Also, the number of available code points is for the fundamental Unicode Inc. character range so well enough that it will not fill up in hundreds of years at the current rate of character addition. Only if people are allowed to massively register private characters, might it break.

but
that only points up the fact that I'm going to have to learn about unicode
once I get through my current arranging binge.

You can read about UTF-8 at
  http://www.cl.cam.ac.uk/~mgk25/unicode.html

Today, Windows uses Unicode exclusively -- even in North America. You
won't have big success with latin1 files.

I routinely switch files between Latin1 text and MS-Word docs with no
problem whatsoever. ... Microsoft's unicode claims are a marketing ploy; Latin1 still
rules.

Editors often have a preference where the default encoding can be chosen. And the output encoding can also be chosen automatically. For example, the mailer I use, scans through the email, and chooses the encoding suitable, ASCII, ISO-Latin-1, or UTF-8, for example.

  Hans Aberg






reply via email to

[Prev in Thread] Current Thread [Next in Thread]