[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Freecats-Dev] About Unicode
From: |
Henri Chorand |
Subject: |
[Freecats-Dev] About Unicode |
Date: |
Thu, 13 Feb 2003 09:55:09 +0100 |
Hi all,
Sooner or later, we'll have to learn more (well, more than what I actually
know) about Unicode.
A brief look at http://www.unicode.org/ convinced me brief is not enough.
The two-level FAQ (at http://www.unicode.org/faq/utf_bom.html) seems very
interesting.
For those with some spare time still, the reference book is freely available
online at:
http://www.unicode.org/uni2book/u2.html
A possible source of concern with Unicode is, there are just so many
flavors, as seen in the FAQ:
> Which do I need to be able to use from:
> UTF8, UTF16, UTF16LE, UTF16BE, UTF32,
> UTF32LE, UTF32BE?
Things seem to get worse when you read the answer:
> Hard to say. UTF-8 will be most common on the web.
> UTF16, UTF16LE, UTF16BE are used by Java and
> Windows.
> UTF32, UTF32LE, UTF32BE are used by various Unix
> systems.
> Luckily, the conversions between all of them are
> algorithmically based and fast.
And for the curious folks who want to experiment, you may use Windows 2000 /
XP notepad in order to use one of following save options for text files:
- ANSI
- Unicode
- Unicode big endian
- UTF-8
Well, as usual, if somebody happens to know Unicode well enough to provide a
few directions, please <shout mode on>DO SO !</shout mode off>
In a nutshell, what we need to know is:
- little endian/big endian issues between Macs, Windows PC & Unix boxes
(Linux/BSD PC for a start)
- how Python defaults on these (it would be handy if the language knows how
to manage these issues)
- "preferred" encodings within the above (I guess, one in which character
length does not vary)
A "typically optimist" extract:
> Hybrid systems in which UTF-16 is used as a disk storage
> format but expanding to UTF-32 in memory is also a
> popular solution combining small long term storage space
> with ease of processing.
Had this stuff been designed with ease of use in mind... ;-)
Anyway, if it's too difficult to master, we may begin with a Windows ANSI
version.
Let me know your thoughts.
Regards,
Henri
- [Freecats-Dev] About Unicode,
Henri Chorand <=